Abstract:
The explosive growth of regulatory hypotheses from single-cell datasets demands accurate prioritization of hypotheses for in vivo validation, but current computational methods fail to shortlist a high-confidence subset that can be feasibly tested. We present Haystack, an algorithm that combines active learning and optimal transport theory to identify and prioritize transient but causally-active transcription factors in cell lineages. We apply Haystack to single-cell observations, guiding efficient and cost-effective in vivo validations that reveal causal mechanisms of cell differentiation in Drosophila gut and blood lineages.Competing Interest StatementThe authors have declared no competing interest.