Welcome to Journal of Beijing Institute of Technology
Volume 24Issue 4
.
Turn off MathJax
Article Contents
CAO Qi-min, GUO Qiao, WU Xiang-hua. Similarity matrix-based K-means algorithm for text clustering[J]. JOURNAL OF BEIJING INSTITUTE OF TECHNOLOGY, 2015, 24(4): 566-572. doi: 10.15918/j.jbit1004-0579.201524.0421
Citation: CAO Qi-min, GUO Qiao, WU Xiang-hua. Similarity matrix-basedK-means algorithm for text clustering[J].JOURNAL OF BEIJING INSTITUTE OF TECHNOLOGY, 2015, 24(4): 566-572.doi:10.15918/j.jbit1004-0579.201524.0421

Similarity matrix-basedK-means algorithm for text clustering

doi:10.15918/j.jbit1004-0579.201524.0421
  • Received Date:2014-04-14
  • K-means algorithm is one of the most widely used algorithms in the clustering analysis. To deal with the problem caused by the random selection of initial center points in the traditional algorithm, this paper proposes an improved K-means algorithm based on the similarity matrix. The improved algorithm can effectively avoid the random selection of initial center points, therefore it can provide effective initial points for clustering process, and reduce the fluctuation of clustering results which are resulted from initial points selections, thus a better clustering quality can be obtained. The experimental results also show that the F-measure of the improved K-means algorithm has been greatly improved and the clustering results are more stable.
  • loading
  • [1]
    Shi Z Z. Knowledge discovery[M]. Beijing: Tsinghua University Press, 2002.
    [2]
    Han J, Kamber M. Data mining: concepts and techniques[M]. San Francisco: Morgan Kaufmann Publishers, 2000.
    [3]
    Grabmeier J, Rudolph A. Techniques of cluster algorithms in data mining[J]. Data Mining and Knowledge Discovery, 2002, 6(4):303-360.
    [4]
    Meyer C D, Wessell C D. Stochastic data clustering[J]. SIAM Journal on Matrix Analysis and Applications, 2012, 33(4): 1214-1236.
    [5]
    Hammouda K M, Kamel M S. Efficient phrase-based document indexing for web document clustering[J]. IEEE Transactions on Knowledge and Data Engineering, 2004, 16(10):1279-1296.
    [6]
    Rousseeuw P J, Kaufman L. Finding groups in data: an introduction to cluster analysis[M].New York: John Wiley & Sons, 2009.
    [7]
    Gnanadesikan R. Methods for statistical data analysis of multivariate observations[M]. New York: John Wiley & Sons, 2011.
    [8]
    Huang J Z, Ng M K, Rong H, et al. Automated variable weighting in K-means type clustering[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2005, 27(5):657-668.
    [9]
    Celebi M E, Kingravi H A, Vela P A. A comparative study of efficient initialization methods for the k-means clustering algorithm[J]. Expert Systems with Applications, 2013, 40(1): 200-210.
    [10]
    Shameem M U S, Ferdous R. An efficient k-means algorithm integrated with Jaccard distance measure for document clustering //AH-ICI 2009, First Asian Himalayas International Conference on Internet, 2009: 1-6.
    [11]
    Li D, Li X B, A modified version of the K-means algorithm based on the shape similarity distance[J]. Applied Mechanics and Materials, 2014, 457: 1064-1068.
    [12]
    Bagirov A M, Ugon J, Webb D. Fast modified global k-means algorithm for incremental cluster construction[J]. Pattern Recognition, 2011, 44(4): 866-876.
    [13]
    Tzortzis G, Likas A. The MinMax k-means clustering algorithm[J]. Pattern Recognition, 2014, 47(7): 2505-2516.
    [14]
    Khan S S, Ahmad A. A cluster center initialization algorithm for K-means clustering[J]. Pattern Recognition Letters, 2004, 25(11):1293-1302.
    [15]
    Aliguliyev R M. Clustering of document collection a weighting approach[J]. Expert Systems with Applications, 2009, 36(4):7904-7916.
    [16]
    Abraham A, Das S, Konar A. Document clustering using differential evolution //CEC 2006 IEEE Congress on Evolutionary Computation, 2006: 1784-1791.
    [17]
    Kanungo T, Mount D M, Netanyahu N S, et al. An efficient K-means clustering algorithm: analysis and implementation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2002,24(7):881-892.
    [18]
    Salton G, Wong A, Yang C S. A vector space model for automatic indexing[J]. Communications of the ACM, 1975, 18(11): 613-620.
    [19]
    Selim S Z, Ismail M A. K-means-type algorithms: a generalized convergence theorem and characterization of local optimality[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1984(1): 81-87.
    [20]
    Jain A K, Dubes R C. Algorithms for clustering data[M]. Englewood Cliffs:Prentice Hall, 1988.
  • 加载中

Catalog

    通讯作者:陈斌, bchen63@163.com
    • 1.

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Article Metrics

    Article views (937) PDF downloads(804) Cited by()
    Proportional views
    Related

    /

      Return
      Return
        Baidu
        map