| 215 | 1 | 117 |
| 下载次数 | 被引频次 | 阅读次数 |
深度模型在单细胞转录组测序(single-cell transcriptome sequencing, scRNA-seq)中以单细胞分辨率提取基因的特征表达,但是scRNA-seq采集过程中存在“dropout”(数据缺失)问题,造成基因表达矩阵存在大量技术零值的噪声数据,部分基因间的关联性被噪声掩盖或影响。盲目地挖掘噪声数据往往会对深度学习模型的训练和推理过程产生消极影响,进而导致批次效应、虚假差异基因表达结果和性能下降等问题,掩藏真正的表达关系。针对以上问题,本文提出了一种融合单细胞转录组数据扩散算法的深度生成模型,通过数据扩散算法在相似的细胞之间分享信息,消除细胞计数矩阵中噪声的同时填补“dropout”现象,提高深度模型的聚类精度并有效去除批次效应。
Abstract:Deep learning models in single-cell transcriptome sequencing(scRNA-seq) enable the extraction of gene expression features at a single-cell resolution. However, the presence of "dropout" issues during scRNA-seq data collection introduces significant technical zero values, resulting in noisy data in the gene expression matrix. This noise can obscure or impact the correlation between certain genes. Blindly mining noisy data often has detrimental effects on the training and inference processes of deep learning models, leading to problems such as batch effects, false differential gene expression results, and decreased performance, thereby concealing genuine expression relationships. To tackle these challenges, this paper introduces a deep generative model that integrates a single-cell transcriptome data diffusion algorithm. By utilizing a data diffusion method to exchange information among similar cells, this approach aims to eliminate noise in the cell count matrix and impute "dropout" events. Consequently, it enhances the clustering accuracy of deep models and effectively mitigates batch effects.
ARISDAKESSIAN C,POIRION O,YUNITS B,et al.,2019.DeepImpute:an accurate,fast,and scalable deep neural network method to impute single-cell RNA-seq data.Genome Biol.,20(1):211.
BARON M,VERES A,WOLOCK S L,et al.,2016.A single-cell transcriptomic map of the human and mouse pancreas reveals inter-and intra-cell population structure.Cell Syst.,3(4):346-360.
BRENNECKE P,ANDERS S,KIM J K,et al.,2013.Accounting for technical noise in single-cell RNA-seq experiments.Nat.Methods,10(12):1093-1095.
BüTTNER M,MIAO Z,WOLF F A,et al.,2019.A test metric for assessing single-cell RNA-seq batch correction.Nat.Methods,16:43-49.
COIFMAN R R,LAFON S,2006.Diffusion maps.Appl.Comput.Harmon.Anal.,21(1):5-30.
ERASLAN G,SIMON L M,MIRCEA M,et al.,2019.Single-cell RNA-seq denoising using a deep count autoencoder.Nat.Commun.,10(1):390-414.
GAYOSO A,LOPEZ R,XING G,et al.,2022.A Python library for probabilistic analysis of single-cell omics data.Nat.Biotechnol.,40(2):163-166.
GR?NBECH C H,VORDING M F,TIMSHEL P N,et al.,2020.scVAE:variational auto-encoders for single-cell gene expression data.Bioinformatics,36(17):4415-4422.
KIM Y,HAO J,GAUTAM Y,et al.,2018.DiffGRN:differen-tial gene regulatory network analysis.Int.J.Data Min.Bioinform.,20:362-79.
LEE Y,BOGDANOFF D,WANG Y,et al.,2021.XYZeq:Spatially resolved single-cell RNA sequencing reveals expression heterogeneity in the tumor microenvironment.Sci.Adv.,7(17):eabg4755.
LI N,LATECKI L J,2017.Affinity learning for mixed data clustering.In Proceedings of the 26th International Joint Conference on Artificial Intelligence (IJCAI′17),pp.2173-2179.
LUECKEN M D,BüTTNER M,CHAICHOOMPU K,et al.,2022.Benchmarking atlas-level data integration in single-cell genomics.Nat.Methods,19(1):41-50.
LYNCH A W,BROWN M,MEYER C A,2023.Multi-batch single-cell comparative atlas construction by deep learning disentanglement.Nat.Commun.,14(1):4126.
MUTO Y,WILSON P C,LEDRU N,et al.,2021.Single cell transcriptional and chromatin accessibility profiling redefine cellular heterogeneity in the adult human kidney.Nat.Commun.,12:2190.
NGUYEN H,TRAN D,TRAN B,et al.,2021.A comprehensive survey of regulatory network inference methods using single cell RNA sequencing data.Brief.Bioinform.,22:bbaa190.
STEGLE O,TEICHMANN S,MARIONI J,2015.Computational and analytical challenges in single-cell transcriptomics.Nat.Rev.Genet.,16(3):133-145.
STUART T,BUTLER A,HOFFMAN P,et al.,2019.Comprehensive integration of single-cell data.Cell,177(7):1888-1902.
SWAPNA L S,HUANG M ,LI Y,2023.GTM-decon:guided-topic modeling of single-cell transcriptomes enables sub-cell-type and disease-subtype deconvolution of bulk transcriptomes.Genome.Biol.,24(1):190.
TIAN T,WAN J,SONG Q,et al.,2019.Clustering single-cell RNA-seq data with a model-based deep learning approach.Nat.Mach.Intell.,1(4):191-198.
TRAAG V A,WALTMAN L,VAN ECK N J,2019.From Louvain to Leiden:guaranteeing well-connected communities.Sci.Rep.,9(5233):1-6.
VAN DIJK D,SHARMA R,NAINYS J,et al.,2018.Recovering gene interactions from single-cell data using data diffusion.Cell,174(3):716-729.
XU J,XU J,MENG Y,et al.,2023.Graph embedding and Gaussian mixture variational autoencoder network for end-to-end analysis of single-cell RNA sequencing data.Cell Rep.Methods,3(1):100382.
ZHAO Y,CAI H,ZHANG Z,et al.,2021.Learning interpretable cellular and gene signature embeddings from single-cell transcriptomic data.Nat.Commun.,12(1):5261.
基本信息:
DOI:10.13417/j.gab.043.000241
中图分类号:R318;TP18
引用信息:
[1]苏秀秀,龙法宁.融合数据扩散算法与深度生成模型的单细胞特征提取研究[J].基因组学与应用生物学,2024,43(02):241-249.DOI:10.13417/j.gab.043.000241.
基金信息:
国家自然科学基金项目(62141207)资助
2024-01-12
2024-01-12
2024-01-12