UNIFIED APPROACH TO DEPENDENT AND DISPARATE
CLUSTERING OF NONHOMOGENOUS DATA

Abstract

There are many data mining settings that involve a combination of attribute-valued descriptors over entities as well as specified relationships between these entities. We present an approach to cluster such nonhomogeneous datasets by using the relationships to impose either dependent clustering or disparate clustering constraints. Unlike prior work that views constraints as Boolean criteria, we present a formulation that allows constraints to be satisfied or violated in a smooth manner. This enables us to achieve dependent clustering and disparate clustering using the same optimization framework by merely maximizing versus minimizing the objective function. We present results on both synthetic data as well as several real-world datasets.

Citation details of the article



Journal: International Journal of Applied Mathematics
Journal ISSN (Print): ISSN 1311-1728
Journal ISSN (Electronic): ISSN 1314-8060
Volume: 32
Issue: 3
Year: 2019

DOI: 10.12732/ijam.v32i3.3

Download Section



Download the full text of article from here.

You will need Adobe Acrobat reader. For more information and free download of the reader, please follow this link.

References

  1. [1] B. Long, X. Wu, Z. Zhang, P.S. Yu, Unsupervised learning on k-partite graphs, In: Proc. KDD ’06 (2006), 317-326.
  2. [2] A. Banerjee, S. Basu, S. Merugu, Multi-way clustering on relation graphs, In: Proc. SDM ’07 (2007), 225-334.
  3. [3] E. Bae, J. Bailey, COALA: A novel approach for the extraction of alternate clustering of high quality and high dissimilarity, In: Proc. ICDM ‘06 (2006), 53-62.
  4. [4] Z. Qi, I. Davidson, A principled and flexible framework for finding alternative clusterings, In: Proc. KDD ‘09 (2009), 717-726.
  5. [5] M.S. Hossain, S. Tadepalli, L.T. Watson, I. Davidson, R.F. Helm, N. Ramakrishnan, Unifying dependent clustering and disparate clustering for nonhomogeneous data, In: Proc. 16th ACM SIGKDD Conf. on Knowledge Discovery and Data Mining (2010), 593-602.
  6. [6] G. Kreisselmeier, R. Steinhauser, Systematic control design by optimizing a vector performance index, In: Proc. IFAC Symp. on Comp. Aided Design of Control Systems (1979), 113-117.
  7. [7] A.R. Conn, N.I.M. Gould, P.L. Toint, LANCELOT: A Fortran Package for Large-scale Nonlinear Optimization (Release A), Volume 17, Springer Verlag (1992).
  8. [8] S.S. Tadepalli, Schemas of Clustering, Virginia Tech. (2009).
  9. [9] I. Davidson, S.S. Ravi, Clustering with constraints: feasibility issues and the k-means algorithm, In: Proc. SDM ‘05 (2005), 201-211.
  10. [10] K. Wagstaff, C. Cardie, S. Rogers, S. Schr¨odl, Constrained K-means clustering and background knowledge, In: Proc. ICML ’01 (2001), 577-584.
  11. [11] D.R. Easterling, Solution of Constrained Clustering Problems through Homotopy Tracking, Virginia Tech. (2014).
  12. [12] L.T. Watson, S.C. Billups, A.P. Morgan, Algorithm 652: HOMPACK: A suite of codes for globally convergent homotopy algorithms, ACM Trans. Math. Software, 13 (1987), 281-310.
  13. [13] M. Bilenko,S. Basu, R.J. Mooney, Integrating constraints and metric learning in semi-supervised clustering, In: Proc. ICML ’04 (2004), 11-18.
  14. [14] P. Jain, R. Meka, I.S. Dhillon, Simultaneous unsupervised learning of disparate clusterings, In: Proc. SDM ’08 (2008), 858-869.
  15. [15] S. Kullback, D.V. Gokhale, The Information in Contingency Tables, Marcel Dekker, Inc. (1978).
  16. [16] A. Banerjee, S. Merugu, I.S. Dhillon, J. Ghosh, Clustering with Bregman Divergences, J. of Machine Learning Research, 6 (2005), 1705-1749.
  17. [17] D. Chakrabarti, S. Papadimitriou, D.S. Modha, C. Faloutsos, Fully automatic cross-associations, In: Proc. KDD ’04 (2004), 79-88.
  18. [18] I.S. Dhillion, S. Mallela, D.S. Modha, Information theoretic co-clustering, In: Proc. KDD ’03 (2003), 89-98.
  19. [19] S. Kaski, J. Nikkil¨a, J. Sinkkonen, L. Lahti, J.E.A. Knuuttilla, C. Roos, Associative clustering for exploring dependencies between function genomics data sets, IEEE/ACM TCBB, 2(3) (2005), 203-216.
  20. [20] N. Friedman, O.Mosenzon, N. Slonim, N. Tishby, Multivariate information bottleneck, In: Proc. UAI ’01 (2001), 152-161.