UNIFIED APPROACH TO DEPENDENT AND DISPARATE
CLUSTERING OF NONHOMOGENOUS DATA
David R. Easterling1, Layne T. Watson2,3,4,
Naren Ramakrishnan2, Richard F. Helm5, Satish Tadepalli7,
M. Shahriar Hossain 6 1University of Dayton Research Institute
University of Dayton
300 College Park, Dayton, OH, 45469, USA
Departments of 2 Computer Science, 3 Mathematics, 4 Aerospace and Ocean Engineering, and 5 Biochemistry
Virginia Polytechnic & State University
Blacksburg, VA 24061, USA 6 Department of Computer Science
University of Texas at El Paso
El Paso, TX, 79968, USA 7 Bloomberg LP
New York City, NY, 10022, USA
There are many data mining settings that involve a combination of attribute-valued
descriptors over entities as well as specified relationships
between these entities. We present an approach to
cluster such nonhomogeneous datasets by using the relationships
to impose either dependent clustering or disparate
clustering constraints. Unlike prior work that views constraints
as Boolean criteria, we present a formulation that
allows constraints to be satisfied or violated in a smooth
manner. This enables us to achieve dependent clustering
and disparate clustering using the same optimization framework
by merely maximizing versus minimizing the objective
function. We present results on both synthetic data as well
as several real-world datasets.
You will need Adobe Acrobat reader. For more information and free download of the reader, please follow this link.
References
[1] B. Long, X. Wu, Z. Zhang, P.S. Yu, Unsupervised learning on k-partite
graphs, In: Proc. KDD ’06 (2006), 317-326.
[2] A. Banerjee, S. Basu, S. Merugu, Multi-way clustering on relation graphs,
In: Proc. SDM ’07 (2007), 225-334.
[3] E. Bae, J. Bailey, COALA: A novel approach for the extraction of alternate
clustering of high quality and high dissimilarity, In: Proc. ICDM ‘06
(2006), 53-62.
[4] Z. Qi, I. Davidson, A principled and flexible framework for finding alternative
clusterings, In: Proc. KDD ‘09 (2009), 717-726.
[5] M.S. Hossain, S. Tadepalli, L.T. Watson, I. Davidson, R.F. Helm, N. Ramakrishnan,
Unifying dependent clustering and disparate clustering for
nonhomogeneous data, In: Proc. 16th ACM SIGKDD Conf. on Knowledge
Discovery and Data Mining (2010), 593-602.
[6] G. Kreisselmeier, R. Steinhauser, Systematic control design by optimizing
a vector performance index, In: Proc. IFAC Symp. on Comp. Aided Design
of Control Systems (1979), 113-117.
[8] S.S. Tadepalli, Schemas of Clustering, Virginia Tech. (2009).
[9] I. Davidson, S.S. Ravi, Clustering with constraints: feasibility issues and
the k-means algorithm, In: Proc. SDM ‘05 (2005), 201-211.
[10] K. Wagstaff, C. Cardie, S. Rogers, S. Schr¨odl, Constrained K-means clustering
and background knowledge, In: Proc. ICML ’01 (2001), 577-584.
[11] D.R. Easterling, Solution of Constrained Clustering Problems through Homotopy Tracking, Virginia Tech. (2014).
[12] L.T. Watson, S.C. Billups, A.P. Morgan, Algorithm 652: HOMPACK: A
suite of codes for globally convergent homotopy algorithms, ACM Trans.
Math. Software, 13 (1987), 281-310.
[13] M. Bilenko,S. Basu, R.J. Mooney, Integrating constraints and metric learning
in semi-supervised clustering, In: Proc. ICML ’04 (2004), 11-18.
[14] P. Jain, R. Meka, I.S. Dhillon, Simultaneous unsupervised learning of disparate
clusterings, In: Proc. SDM ’08 (2008), 858-869.
[15] S. Kullback, D.V. Gokhale, The Information in Contingency Tables, Marcel
Dekker, Inc. (1978).
[16] A. Banerjee, S. Merugu, I.S. Dhillon, J. Ghosh, Clustering with Bregman
Divergences, J. of Machine Learning Research, 6 (2005), 1705-1749.
[17] D. Chakrabarti, S. Papadimitriou, D.S. Modha, C. Faloutsos, Fully automatic
cross-associations, In: Proc. KDD ’04 (2004), 79-88.
[18] I.S. Dhillion, S. Mallela, D.S. Modha, Information theoretic co-clustering,
In: Proc. KDD ’03 (2003), 89-98.
[19] S. Kaski, J. Nikkil¨a, J. Sinkkonen, L. Lahti, J.E.A. Knuuttilla, C. Roos, Associative
clustering for exploring dependencies between function genomics
data sets, IEEE/ACM TCBB, 2(3) (2005), 203-216.
[20] N. Friedman, O.Mosenzon, N. Slonim, N. Tishby, Multivariate information
bottleneck, In: Proc. UAI ’01 (2001), 152-161.