Title: Multiple kernel learning for integrative clustering in genomic precision medicine
Authors: Paul Kirk - University of Cambridge (United Kingdom)
Alessandra Cabassi - University of Cambridge (United Kingdom) [presenting]
Abstract: A method is presented to integrate information from diverse, high-dimensional omics datasets, together with clinical information, in order to define clinically actionable disease subtypes. We show how kernel methods, such as kernel k-means, can be used to perform integrative clustering using multiple kernel learning, demonstrating that any symmetric positive-definite matrix representing the pairwise similarities between patients can be used as kernel matrix. Therefore, it is possible to define kernels using the output of Bayesian clustering models or consensus clustering algorithms, for instance. The use of kernel methods allows the dimension of the problem to be reduced from $N \times P$ to $N \times N$: this is a great advantage in omics applications, where $P$ is usually much larger than $N$. We further extend to the (semi-) supervised setting, in which additional clinical ``side information'' is available (e.g. survival data), and demonstrate that this can help to guide the clustering toward more relevant stratifications. We apply these methods to cancer datasets, where we combine multiple omics datasets and use phenotypic traits as side information in order to identify disease sub-types.