CMStatistics 2021: Start Registration
View Submission - CMStatistics
Title: A fast and automated clustering of large scale high dimensional data: Application to scRNA-seq data Authors:  Shahina Rahman - Texas A&M University (United States) [presenting]
Suhasini Subbarao - Texas A&M (United States)
Valen Johnson - TAMU (United States)
Abstract: Technological advancements are now occurring at a breathtaking speed, thus allowing researchers to collect a massive volume of data. For example, in biomedical engineering, using next-generation sequencing technologies, it is now possible to profile the transcriptome of individual cells. This technology can provide detailed catalogs of millions of cells found in a sample. Despite the availability of a large number of clustering algorithms, very few of them can be applied to these massive high dimensional datasets due to large computational costs and the lack of reliability of their results. To address such issues, we have developed a fast and scalable clustering algorithm based on the Gram matrix transformation. The major advantage of this clustering method over other competitors is that it lacks major tuning parameters and runs in linear time. Besides, under mild assumptions, the method also provides a theoretical guarantee on its result.