A general computational approach to predicting synergistic transcriptional cores that determine cell subpopulation identities.
- Integrated BioBank of Luxembourg
Advances in single-cell RNA-sequencing techniques reveal the existence of distinct cell subpopulations. Identification of transcription factors (TFs) that define the identity of these subpopulations poses a challenge. Here, we postulate that identity depends on background subpopulations, and is determined by a synergistic core combination of TFs mainly uniquely expressed in each subpopulation, but also TFs more broadly expressed across background subpopulations. Building on this view, we develop a new computational method for determining such synergistic identity cores of subpopulations within a given cell population. Our method utilizes an information-theoretic measure for quantifying transcriptional synergy, and implements a novel algorithm for searching for optimal synergistic cores. It requires only single-cell RNA-seq data as input, and does not rely on any prior knowledge of candidate genes or gene regulatory networks. Hence, it can be directly applied to any cellular systems, including those containing novel subpopulations. The method is capable of recapitulating known experimentally validated identity TFs in eight published single-cell RNA-seq datasets. Furthermore, some of these identity TFs are known to trigger cell conversions between subpopulations. Thus, this methodology can help design strategies for cell conversion within a cell population, guiding experimentalists in the field of stem cell research and regenerative medicine.