S Schbath ****
Finding motifs with unusual statistics. Can run on a set of words not just words as letters. Restriction sites are no common or DNA would be cleaved too often - e.g. EcoR1 sites. Chi-motif is very common as it protects DNA from enzymic degradation. Promoter regions are also uncommon.
Chi motifs are species specific. Skewed so you can check for the sequence against the reverse compliment to check levels of skew between strands.
Need Gaussian statistics with high word frequencies and Poisson based models with low frequencies. This allows you to use z-tests for short words which are frequent. Set the distribution in the command line.
Can compare distributions of words between two regions, organisms etc.
- H0 equally exceptional in both sequences
- H1 more exceptional in the first sequence
- Adds -seq2 to the command line.