Our model for judging conservation is described in our paper above. Using a multi-species alignment, microRNA seed sites are evaluated by the number of species they are conserved to. This conservation is then scored according to a context-specific background model and given a p-value. The p-value gives the probability that the site would be conserved as widely as it is, under the background model where there is no additional selective pressure on the seed site besides selection on the protein sequence and on codon choice. Seed sites are classified as 'conserved' if the p-value is below a cutoff value, chosen so that the false positive rate on the set of conserved targets is below 50%.
Determining Conserved Sites
In order to rank the confidence of target predictions, we have modified the methodology of Friedman et al (Genome Research (2009) 19:92-105), to produce a 'Probability of Conserved Targeting' for each target transcript. The 'Probability of Conserved Targeting' takes into account the number of sites in each transcript, the p-value for each site (lower p-value means a higher 'Probability of Conserved Targeting'), and the targeting microRNA (some microRNAs show much greater evidence for conserved targeting than others).
The 'Probability of Conserved Targeting' gives our estimate for the probability that at least one of the conserved seed sites within a transcript (for a specific microRNA) has been conserved due to selective pressure, and not merely by chance.
microRNA targeting in the ORF compare to targeting in the 3'UTR
ORF targeting appears to generally be of weaker effect than targeting in the 3'UTR, however a number of cases of ORF targeting have already been demonstrated. We have seen some evidence (both computationally and experimentally) that the disparity between ORF targeting and 3'UTR targeting may not be as strong in Drosophila as it is in mammals. Our paper above contains additional discussion of microRNA targeting in ORFs and references to previous work in this area.
The level of preferential conservation in humans is smaller than in Drosophila and so the confidence of target predictions is lower. While as a whole, 7-mer matches show evidence for preferential conservation, the majority of the sites cannot be predicted at high confidence. In order to only retain target sites with high confidence, we focused on those transcripts containing conserved 8-mers and treated all 7-mers as non-conserved.
Please contact Michael Schnall-Levin (mschnall at Gmail dot. com)