Clinical Data Analysis and Validation | Process Systems and Operations Research Laboratory

Breast cancer patients require systematical and pertinent therapeutic treatment to achieve better pathologic response. In order to develop and improve more valid therapeutic instructions, we take advantage of mathematical methods to analyze present clinical data to identify patients’ information for better cancer treatments. Dealing with the raw dataset within patients’ different personal or biomarker information, clustering approaches are applied to partition these data into different groups. The objective is to probe the relationship with different groups and varying pathological response to validate the corresponding therapeutic regimens and aid in the development of new therapies personalized for patient attributes.

The k-means clustering algorithm is an optimization-based partition-classification method for creating cluster labels for each object in the dataset [1]. This method is advantageous as it be applied to continuous and discrete multidimensional datasets. However, in general, the structure of the data results in a nonconvex optimization problem for which the conventional algorithm is insufficient in guaranteeing optimality. Additionally, since the number of clusters is chosen a priori, it often leads to suboptimal classifications and requires additional analysis at the expense of quantitative rigor [2].

To overcome the deficiencies in quantitative rigor associated with conventional k-means clustering approaches, we are developing a novel rigorous game theoretic formulation which does not require a priori knowledge of the number of desired clusters. Our approach has the ability to determine the optimal number of clusters required to cover the dataset of interest as well as provide guarantees of global optimality.

By analyzing the clustering results and the detailed information of their members, we can verify the effectiveness of the corresponding treatments and the favorable diagnostic biomarkers. This analysis will provide invaluable information for future research activities in designing therapies and therapeutics.

References

[1] A. Likas, N. Vlassis, and J. J. Verbeek, “The global k-means clustering algorithm,” Pattern recognition, vol. 36, no. 2, pp. 451-461, 2003.
[2] D. Pelleg, A. W. Moore, et al., “X-means: Extending k-means with efficient estimation of the number of clusters.,” in ICML, vol. 1, pp. 727-734, 2000.