public class Motif extends java.lang.Object implements Binnable, java.lang.Cloneable, java.lang.Comparable<Motif>, java.io.Serializable
Alphabet
object so that motifs can be compared and merged properly. A sig score is a score given to a
motif to assess its biological significance in relation to a given question.
A flag is used to keep track of whether a motif is counted on both strands of the DNA, or just on the positive strand. The value of this flag is referred to as the reverse complement status since it determines whether the reverse complement is used in performing searches on DNA. When motifs are converted to and from text, 'R' and 'r' correspond to this flag being true or false, respectively.
Motifs are more or less immutable. The only method that can change a motif is stripNs()
, which usually isn't called until the
end of a program run.
Note that Comparable.compareTo(Object)
is not consistent with equals(Object)
or with equivalentTo(Motif)
. This is because
under most circumstances motifs are sorted by score, but are otherwise compared for equivalence based on sequence characteristics. A
variety of subclasses of MotifComparator
provide alternate ways of comparing motifs.
Modifier and Type | Class and Description |
---|---|
private static class |
Motif.LastExpansion |
static class |
Motif.ScoreData |
Modifier and Type | Field and Description |
---|---|
private Alphabet |
alf |
private java.lang.String |
algorithm |
static Motif.ScoreData |
defaultScoreData |
private java.lang.String |
flaggedSequence |
private static Motif.LastExpansion |
lastExpansion |
static boolean |
LEFT
Defines the left (5') side of a motif.
|
static double |
MIN_COVERAGE
For use in determining equivalence.
|
static int |
MIN_RANDOM_LENGTH
The default minimum length of randomly generated motifs.
|
static double |
PRIMARY_BASE_PROBABILITY
The probability of choosing A, C, G, or T when generating a base for a random motif.
|
static boolean |
PRINT_REV_COMP_ALPH_LAST
If true, motif that use the reverse compliment will have they're string printed in the alphabetically last order.
|
private boolean |
revComp |
static boolean |
RIGHT
Defines the right (3') side of a motif.
|
private double |
score |
private static long |
scoreDataInstantiationCounter |
private java.lang.String |
scoreInfo |
static java.lang.String |
SEPARATOR
The character used to separate the sequence from the flag in output and parsing.
|
private java.lang.String |
sequence |
private static long |
serialVersionUID |
Constructor and Description |
---|
Motif(java.lang.String seq,
boolean rc,
Alphabet alphabet)
Creates a motif with sequence seq; rc specifies whether the reverse complement is used.
|
Modifier and Type | Method and Description |
---|---|
Motif |
addBase(char base,
boolean direction)
|
static int |
ambiguousBaseCountOf(java.lang.String seq,
Alphabet alf)
Returns the total number of ambiguous bases in the passed in sequence
|
static double |
ambiguousFactorOf(java.lang.String seq,
Alphabet alf)
Returns the ambiguous factor of a sequence.
|
private java.lang.String |
bestUnionOf(char[] seq1,
char[] seq2)
Returns a sequence that is the length of seq1 and is the most unambiguous of all possible char-by-char unions between seq1 and seq2;
ie.if one motif is longer than the other, a sliding window approach is used to try all possible matches.
|
java.lang.Object |
clone()
Returns a shallow copy of this motif, which is the same as a deep copy since all member data is primitive or immutable objects.
|
int |
compareTo(Motif m)
Allows Arrays.sort() to sort Motifs by score in descending order; not consistent with
equals(Object) . |
boolean |
equals(java.lang.Object o)
Returns true if this motif has the same sequence, and reverse complement status as o; score is NOT taken into account, nor is alphabet.
|
boolean |
equivalentTo(Motif m)
Returns true if this motif is
equivalent to the given motif, with the minimum coverage set to
MIN_COVERAGE . |
boolean |
equivalentTo(Motif m,
double minCoverage)
Assesses the equivalence of this motif to the given motif.
|
Motif[] |
expand()
Returns the unsorted set of unambiguous motifs that this motif represents.
|
java.lang.String[] |
expandAsStrings()
Returns the unsorted set of unambiguous sequences that this motif represents, without any duplicates.
|
Motif[] |
expandNeighbors()
Returns an array of all of the unambiguous motifs that are within a hamming distance of 1 from this motif, without any duplicates.
|
Motif[] |
expandNeighbors(int hd)
Returns an array of all of the unambiguous motifs that are within a hamming distance of hd from this motif, without any duplicates.
|
long |
expansionCountOf() |
private long |
expansionCountOf(java.lang.String seq)
Returns the number of unambiguous sequences that seq represents.
|
static long |
expansionCountOf(java.lang.String seq,
Alphabet alf)
Returns the number of unambiguous sequences that seq represents.
|
private java.lang.String[] |
expansionOf(java.lang.String seq)
Returns the unsorted set of unambiguous sequences that this sequence represents.
|
boolean |
generates(java.lang.String seq)
Returns true if and only if seq is an instantiation of this motif.
|
java.lang.String |
getAlgorithm() |
Alphabet |
getAlphabet() |
int |
getAmbiguousBaseCount()
Returns the total number of ambiguous bases in this motif's sequence.
|
double |
getAmbiguousFactor()
Returns the ambiguous factor of a sequence.
|
Motif |
getFirstHalf()
Returns a new motif whose sequence is the first half of this motif's sequence.
|
java.lang.String |
getFlaggedSequence()
Returns a string representation of this motif as sequence,R/r; this can be turned back into a motif using
parseMotif(String, Alphabet) ; reverse complement-able motifs are always returned using the alphabetically last sequence. |
int |
getInternalNCount()
Returns the number of internal Ns in this sequence.
|
int |
getNCount()
Returns the number of Ns in this sequence.
|
static int |
getNCount(java.lang.String seq)
Returns the total number of
Alphabet.N 's in the given string. |
Motif |
getPrefix(int i)
Returns a new motif whose sequence is the first i characters of this motif's sequence.
|
static double |
getProbablityOfMatch(java.lang.String seq,
Alphabet alf) |
Motif |
getRandomInstantiation()
Returns a random instantiation of this motif; the instantiation will have the same alphabet and reverse compliment flag.
|
Motif[] |
getRandomInstantiations(int count)
Returns count random instantiations.
|
java.util.regex.Pattern |
getRegExSearchPattern() |
Motif |
getRevComp()
Returns a motif whose sequence is the reverse compliment of this sequence, with the same revComp flag and alphabet as this sequence.
|
java.lang.String |
getRevCompSequence()
Returns the reverse complement sequence, or null if the reverse complement isn't used.
|
double |
getScore() |
java.lang.String |
getScorerInfo() |
int |
getSearchedPatternLength() |
java.lang.String |
getSequence() |
java.lang.String |
getStrippedRevCompSequence()
Returns the reverse complement sequence stripped of leading and trailing N's, or null if the reverse complement isn't used.
|
java.lang.String |
getStrippedSequence()
Returns the sequence stripped of leading and trailing N's.
|
double |
getValue()
Same as getScore().
|
int |
hashCode()
Simply returns the hashCode value of the
getFlaggedSequence() representation. |
private void |
initialize() |
private java.lang.String |
intersectionOf(char[] seq1,
char[] seq2)
Returns the intersection of two sequences, which must be the same length, although that is not checked here.
|
Motif |
intersectionWith(Motif m)
Returns the intersection of this motif and m, which must be the same length; if either motif uses the reverse complement, the motif
returned is the most unambiguous of the two possible char-by-char intersections between this and m.
|
int |
length() |
Motif |
makeBipartite(int nCount)
Returns a copy of this motif with nCount N's stuck in its middle.
|
private java.lang.String |
makePrettySequence(java.lang.String seq)
Returns a sequence with primary bases in caps and ambiguous bases as lowercase slash-separated lists of the base expansion
|
static Motif |
makeRandomMotif(int maxLength,
boolean revComp,
Alphabet alf)
Returns a random motif whose length is between
MIN_RANDOM_LENGTH and maxLength, inclusive. |
static Motif |
makeRandomMotif(int minLength,
int maxLength,
boolean revComp,
Alphabet alf)
Returns a random motif whose length is between minLength and maxLength, inclusive.
|
static Motif |
makeRandomMotif(int minLength,
int maxLength,
double minAmb,
double maxAmb,
boolean revComp,
Alphabet alf)
Returns a random motif whose length is between minLength and mazLength and whose ambiguous factor is between minAmb and maxAmb.
|
Motif |
matchAmbiguity(Motif mold)
Returns a new motif that has the same sequence as this motif, but with ambiguous bases in the same places as the mold motif.
|
Motif[] |
motifify(java.lang.String[] seqs)
Turns an array of sequences into an array of motifs with the same reverse complement status and alphabet as this.
|
static Motif |
parseMotif(java.lang.String s,
Alphabet alf)
Generates a motif from the string of the form [sequence]SEPARATOR[r/R].
|
static Motif |
parseMotif(java.lang.String s,
Alphabet alf,
boolean useSequenceAsFlaggedRepresentation) |
static double |
probabilityOf(java.lang.String seq,
Alphabet alf) |
private java.lang.String |
revCompOf(java.lang.String seq)
Returns the sequence that is the reverse compliment of the given sequence.
|
private static java.lang.String |
revCompOf(java.lang.String seq,
Alphabet alf) |
Motif |
scrambleBases()
Returns a randomly scrambled version of this motif.
|
static java.lang.String |
sequenceRegExOf(java.lang.String sequence,
Alphabet alf) |
static java.util.regex.Pattern |
sequenceRegExPatternOf(java.lang.String sequence,
Alphabet alf) |
void |
setAlgorithm(java.lang.String algorithm) |
Motif |
setBase(char b,
int p)
Returns a new Motif that is a clone of this motif, but it has base b in position p.
|
Motif |
setFlag(boolean b)
Returns a new motif that is a clone of this motif, but with the specified flag.
|
void |
setRevComp(boolean b) |
void |
setScore(double pScore,
java.lang.String pScoreInfo) |
private void |
setSequence(java.lang.String newSequence) |
void |
stripNs()
Strips all leading and trailing N's from the sequence.
|
private static java.lang.String |
stripNsFrom(java.lang.String seq)
Strips all leading and trailing N's from a sequence that contains at least one non-N character.
|
int |
strippedLength()
Returns the length of the sequence not including leading and trailing N's.
|
java.lang.String |
toPrettyString()
Returns the sequence with primary bases in caps and ambiguous bases as lowercase slash-separated lists of the base expansion, plus ","
followed by the expansion of the reverse complement if it is used.
|
java.lang.String |
toString()
Returns the score followed by a tab and
getFlaggedSequence() . |
private java.lang.String |
unionOf(char[] seq1,
int seq1StartIndex,
char[] seq2,
int seq2StartIndex,
int length)
Returns the char-by-char union of seq1 and seq2[seq2StartIndex] through seq2[seq2StartIndex+length], where seq1[seq1StartIndex] through
seq1[seq1StartIndex+length] is matched up with seq2[seq2StartIndex] through seq2[seq2StartIndex+length], and other chars are the same
as in seq1.
|
Motif |
unionWith(Motif m)
Returns the union of this motif and m; the motif returned is the length of this motiif and is the most unambiguous of all possible
char-by-char unions between this and m.
|
boolean |
useRevComp()
Returns true if this motif uses the reverse complement of the sequence.
|
private static Motif.LastExpansion lastExpansion
public static final boolean LEFT
public static final double MIN_COVERAGE
public static final int MIN_RANDOM_LENGTH
public static final double PRIMARY_BASE_PROBABILITY
public static final boolean PRINT_REV_COMP_ALPH_LAST
public static final boolean RIGHT
public static final java.lang.String SEPARATOR
private static final long serialVersionUID
private final Alphabet alf
private java.lang.String algorithm
private transient java.lang.String flaggedSequence
private boolean revComp
public static final Motif.ScoreData defaultScoreData
private double score
private java.lang.String scoreInfo
private java.lang.String sequence
private static long scoreDataInstantiationCounter
public Motif(java.lang.String seq, boolean rc, Alphabet alphabet)
public static int ambiguousBaseCountOf(java.lang.String seq, Alphabet alf)
public static double ambiguousFactorOf(java.lang.String seq, Alphabet alf)
public static long expansionCountOf(java.lang.String seq, Alphabet alf)
public static int getNCount(java.lang.String seq)
Alphabet.N
's in the given string.public static double getProbablityOfMatch(java.lang.String seq, Alphabet alf)
public static Motif makeRandomMotif(int maxLength, boolean revComp, Alphabet alf)
MIN_RANDOM_LENGTH
and maxLength, inclusive.public static Motif makeRandomMotif(int minLength, int maxLength, boolean revComp, Alphabet alf)
public static Motif makeRandomMotif(int minLength, int maxLength, double minAmb, double maxAmb, boolean revComp, Alphabet alf)
public static Motif parseMotif(java.lang.String s, Alphabet alf)
useCanonicalRepresentation
- public static Motif parseMotif(java.lang.String s, Alphabet alf, boolean useSequenceAsFlaggedRepresentation)
public static double probabilityOf(java.lang.String seq, Alphabet alf)
private static java.lang.String revCompOf(java.lang.String seq, Alphabet alf)
public static java.lang.String sequenceRegExOf(java.lang.String sequence, Alphabet alf)
public static java.util.regex.Pattern sequenceRegExPatternOf(java.lang.String sequence, Alphabet alf)
private static java.lang.String stripNsFrom(java.lang.String seq)
public Motif addBase(char base, boolean direction)
private java.lang.String bestUnionOf(char[] seq1, char[] seq2)
public java.lang.Object clone()
clone
in class java.lang.Object
public int compareTo(Motif m)
equals(Object)
.compareTo
in interface java.lang.Comparable<Motif>
public boolean equals(java.lang.Object o)
equals
in class java.lang.Object
public boolean equivalentTo(Motif m)
equivalent
to the given motif, with the minimum coverage set to
MIN_COVERAGE
.public boolean equivalentTo(Motif m, double minCoverage)
public Motif[] expand()
public java.lang.String[] expandAsStrings()
public Motif[] expandNeighbors()
public Motif[] expandNeighbors(int hd)
public long expansionCountOf()
private long expansionCountOf(java.lang.String seq)
private java.lang.String[] expansionOf(java.lang.String seq)
public boolean generates(java.lang.String seq)
public java.lang.String getAlgorithm()
public Alphabet getAlphabet()
public int getAmbiguousBaseCount()
public double getAmbiguousFactor()
public Motif getFirstHalf()
public java.lang.String getFlaggedSequence()
parseMotif(String, Alphabet)
; reverse complement-able motifs are always returned using the alphabetically last sequence.public int getInternalNCount()
public int getNCount()
public Motif getPrefix(int i)
public Motif getRandomInstantiation()
public Motif[] getRandomInstantiations(int count)
public java.util.regex.Pattern getRegExSearchPattern()
public Motif getRevComp()
public java.lang.String getRevCompSequence()
public double getScore()
public java.lang.String getScorerInfo()
public int getSearchedPatternLength()
public java.lang.String getSequence()
public java.lang.String getStrippedRevCompSequence()
public java.lang.String getStrippedSequence()
public int hashCode()
getFlaggedSequence()
representation.hashCode
in class java.lang.Object
private void initialize()
private java.lang.String intersectionOf(char[] seq1, char[] seq2)
public Motif intersectionWith(Motif m)
public int length()
public Motif makeBipartite(int nCount)
private java.lang.String makePrettySequence(java.lang.String seq)
public Motif matchAmbiguity(Motif mold)
example: acgt
molded to stgh
may produce wcgb
or mcgd
, for example.
java.lang.IllegalArgumentException
- if mold is not the same length as this motif.public Motif[] motifify(java.lang.String[] seqs)
private java.lang.String revCompOf(java.lang.String seq)
public Motif scrambleBases()
public void setAlgorithm(java.lang.String algorithm)
public Motif setBase(char b, int p)
public Motif setFlag(boolean b)
public void setRevComp(boolean b)
public void setScore(double pScore, java.lang.String pScoreInfo)
private void setSequence(java.lang.String newSequence)
public void stripNs()
public int strippedLength()
public java.lang.String toPrettyString()
public java.lang.String toString()
getFlaggedSequence()
.toString
in class java.lang.Object
private java.lang.String unionOf(char[] seq1, int seq1StartIndex, char[] seq2, int seq2StartIndex, int length)
public Motif unionWith(Motif m)
If one motif is longer than the other, a sliding window approach is used to try all possible matches. If either of the motifs are set to use the reverse complement, the reverse complement of m will also be tried in all possible matches (and the motif returned will be set to use the reverse complement).
public boolean useRevComp()