Article Contents

Article Navigation> JOURNAL OF BEIJING INSTITUTE OF TECHNOLOGY> 2015> 24(4): 566-572

CAO Qi-min, GUO Qiao, WU Xiang-hua. Similarity matrix-based K-means algorithm for text clustering[J]. JOURNAL OF BEIJING INSTITUTE OF TECHNOLOGY, 2015, 24(4): 566-572. doi: 10.15918/j.jbit1004-0579.201524.0421

Citation:

CAO Qi-min, GUO Qiao, WU Xiang-hua. Similarity matrix-basedK-means algorithm for text clustering[J].JOURNAL OF BEIJING INSTITUTE OF TECHNOLOGY, 2015, 24(4): 566-572.doi:10.15918/j.jbit1004-0579.201524.0421

Citation:

CAO Qi-min, GUO Qiao, WU Xiang-hua. Similarity matrix-basedK-means algorithm for text clustering[J].JOURNAL OF BEIJING INSTITUTE OF TECHNOLOGY, 2015, 24(4): 566-572.doi:10.15918/j.jbit1004-0579.201524.0421

PDF( 412 KB)

Similarity matrix-basedK-means algorithm for text clustering

doi:10.15918/j.jbit1004-0579.201524.0421

School of Automation, Beijing Institute of Technology, Beijing 100081, China

Received Date:2014-04-14

Abstract

Abstract

K-means algorithm is one of the most widely used algorithms in the clustering analysis. To deal with the problem caused by the random selection of initial center points in the traditional algorithm, this paper proposes an improved K-means algorithm based on the similarity matrix. The improved algorithm can effectively avoid the random selection of initial center points, therefore it can provide effective initial points for clustering process, and reduce the fluctuation of clustering results which are resulted from initial points selections, thus a better clustering quality can be obtained. The experimental results also show that the F-measure of the improved K-means algorithm has been greatly improved and the clustering results are more stable.
- text clustering,
- K-means algorithm,
- similarity matrix,
- F-measure

FullText(HTML)

References (20)

References

[1]	Shi Z Z. Knowledge discovery[M]. Beijing: Tsinghua University Press, 2002.
[2]	Han J, Kamber M. Data mining: concepts and techniques[M]. San Francisco: Morgan Kaufmann Publishers, 2000.
[3]	Grabmeier J, Rudolph A. Techniques of cluster algorithms in data mining[J]. Data Mining and Knowledge Discovery, 2002, 6(4):303-360.
[4]	Meyer C D, Wessell C D. Stochastic data clustering[J]. SIAM Journal on Matrix Analysis and Applications, 2012, 33(4): 1214-1236.
[5]	Hammouda K M, Kamel M S. Efficient phrase-based document indexing for web document clustering[J]. IEEE Transactions on Knowledge and Data Engineering, 2004, 16(10):1279-1296.
[6]	Rousseeuw P J, Kaufman L. Finding groups in data: an introduction to cluster analysis[M].New York: John Wiley & Sons, 2009.
[7]	Gnanadesikan R. Methods for statistical data analysis of multivariate observations[M]. New York: John Wiley & Sons, 2011.
[8]	Huang J Z, Ng M K, Rong H, et al. Automated variable weighting in K-means type clustering[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2005, 27(5):657-668.
[9]	Celebi M E, Kingravi H A, Vela P A. A comparative study of efficient initialization methods for the k-means clustering algorithm[J]. Expert Systems with Applications, 2013, 40(1): 200-210.
[10]	Shameem M U S, Ferdous R. An efficient k-means algorithm integrated with Jaccard distance measure for document clustering //AH-ICI 2009, First Asian Himalayas International Conference on Internet, 2009: 1-6.
[11]	Li D, Li X B, A modified version of the K-means algorithm based on the shape similarity distance[J]. Applied Mechanics and Materials, 2014, 457: 1064-1068.
[12]	Bagirov A M, Ugon J, Webb D. Fast modified global k-means algorithm for incremental cluster construction[J]. Pattern Recognition, 2011, 44(4): 866-876.
[13]	Tzortzis G, Likas A. The MinMax k-means clustering algorithm[J]. Pattern Recognition, 2014, 47(7): 2505-2516.
[14]	Khan S S, Ahmad A. A cluster center initialization algorithm for K-means clustering[J]. Pattern Recognition Letters, 2004, 25(11):1293-1302.
[15]	Aliguliyev R M. Clustering of document collection a weighting approach[J]. Expert Systems with Applications, 2009, 36(4):7904-7916.
[16]	Abraham A, Das S, Konar A. Document clustering using differential evolution //CEC 2006 IEEE Congress on Evolutionary Computation, 2006: 1784-1791.
[17]	Kanungo T, Mount D M, Netanyahu N S, et al. An efficient K-means clustering algorithm: analysis and implementation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2002,24(7):881-892.
[18]	Salton G, Wong A, Yang C S. A vector space model for automatic indexing[J]. Communications of the ACM, 1975, 18(11): 613-620.
[19]	Selim S Z, Ismail M A. K-means-type algorithms: a generalized convergence theorem and characterization of local optimality[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1984(1): 81-87.
[20]	Jain A K, Dubes R C. Algorithms for clustering data[M]. Englewood Cliffs:Prentice Hall, 1988.

Relative Articles

Supplements (0)

Cited By

Proportional views

Proportional views

通讯作者:陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

Get Citation

PDF

XML

Article Metrics

Article views (937) PDF downloads(804)

Similarity matrix-basedK-means algorithm for text clustering

doi:10.15918/j.jbit1004-0579.201524.0421

Abstract

References

Proportional views

Catalog

通讯作者:陈斌, bchen63@163.com

Article Metrics

Proportional views

Related

Similarity matrix-basedK-means algorithm for text clustering

doi:10.15918/j.jbit1004-0579.201524.0421

Abstract

References

Proportional views

Catalog

通讯作者:陈斌, bchen63@163.com

Article Metrics

Proportional views

Related

Export File

Citation

Format

Content