A Comparison of Text Classification Methods k-NN, Naïve Bayes, and Support Vector Machine for News Classification

Fanny Fanny; Yohan Muliono; Fidelson Tanzil

doi:10.30591/jpit.v3i2.828

A Comparison of Text Classification Methods k-NN, Naïve Bayes, and Support Vector Machine for News Classification

Fanny Fanny, Yohan Muliono, Fidelson Tanzil

Abstract

In this era, a rapid thriving Internet occasionally complicates users to retrieve news category furthermore if there are plentiful of news to be categorized. News categorization is a technique can be used to retrieve a category of news which gives easiness for users. Internet has vast amounts of information especially at news. Therefore, accurate and speedy access is becoming ever more difficult. This paper compares a news categorization using k-Nearest Neighbor, Naive Bayes and Support Vector Machine. Using vary of variables and through a several steps of preprocessing which proving k-Nearest Neighbor is producing a capable accuracy competes with Support Vector Machine whereas Naive Bayes producing just an average result, not as good as k-Nearest Neighbor and Support Vector Machine yet as bad as k-Nearest Neighbor and Support Vector Machine ever reach. As the results, k-Nearest Neighbor using correlation measurement type produces the best result of this experiment.

Full Text:

References

Z. Yong, L. Youwen, and X. Shixiong, “An Improved KNN Text Classification Algorithm Based on Clustering,” J. Comput., vol. 4, no. 3, pp. 230–237, 2009.

H. P. Luhn, “A Statistical Approach to Mechanized Encoding and Searching of Literary Information,” IBM J. Res. Dev., vol. 1, no. 4, pp. 309–317, 1957.

M. E. Maron and J. L. Kuhns, “On Relevance, Probabilistic Indexing and Information Retrieval,” J. ACM, vol. 7, no. 3, pp. 216–244, 1960.

T. Cover and P. Hart, “Nearest neighbor pattern classification,” IEEE Trans. Inf. Theory, vol. 13, no. 1, pp. 21–27, 1967.

T. Kanungo, D. M. Mount, N. S. Netanyahu, C. D. Piatko, R. Silverman, and a. Y. Wu, “An efficient k-means clustering algorithm: analysis and implementation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 24, no. 7, pp. 881–892, 2002.

M.-L. Zhang and Z.-H. Zhou, “A k-nearest neighbor-based algorithm for multi-label classification,” vol. 2, pp. 718 – 721 Vol. 2, 2005.

X. X. Su Jinshu, Zhang Bofeng, “Advances in Machine Learning Based Text Categorization,” J. Chem. Inf. Model., vol. 53, pp. 1689–1699, 2013.

D. Sharma, “Experimental Analysis of KNN with Naive Bayes, SVM and Naive Bayes Algorithms for Spam Mail Detection,” vol. 8491, no. 4, pp. 225–228, 2016.

S. Hassan, M. Rafi, and M. S. Shaikh, “Comparing SVM and Naïve Bayes classifiers for text categorization with Wikitology as knowledge enrichment,” Proc. 14th IEEE Int. Multitopic Conf. 2011, INMIC 2011, pp. 31–34, 2011.

L. Pradhan, N. A. Taneja, C. Dixit, and M. Suhag, “Comparison of Text Classifiers on News Articles,” Int. Res. J. Eng. Technol., vol. 4, no. 3, pp. 2513–2517, 2017.

S. Tan, “An effective refinement strategy for KNN text classifier,” Expert Syst. Appl., vol. 30, no. 2, pp. 290–298, 2006.

D. D. Lewis, “Representation and learning in information retrieval,” vol. 7, 1992.

J. D. M. Rennie, L. Shih, J. Teevan, and D. R. Karger, “Tackling the Poor Assumptions of Naive Bayes Text Classifiers,” no. 1973, 2003.

A. Mccallum and K. Nigam, “A Comparison of Event Models for Naive Bayes Text Classi cation,” 1997.

V. Vapnik, The Nature of Statistical Learning Theory. New York: Springer, 1995.

T. Joachims, “Text Categorization with Support Vector Machines: Learning with Many Relevant Features,” pp. 2–7.

Y. Zhang, R. Jin, and Z. H. Zhou, “Understanding bag-of-words model: A statistical framework,” Int. J. Mach. Learn. Cybern., vol. 1, no. 1–4, pp. 43–52, 2010.

a. Selamat, H. Yanagimoto, and S. Omatu, “Web news classification using neural networks based on PCA,” Proc. 41st SICE Annu. Conf. SICE 2002., vol. 4, pp. 2389–2394, 2002.

DOI: https://doi.org/10.30591/jpit.v3i2.828