Pengaruh Parameter Word2Vec terhadap Performa Deep Learning pada Klasifikasi Sentimen

Dwi Intan Af'idah, Dairoh Dairoh, Sharfina Febbi Handayani, Riszki Wijayatun Pratiwi

Abstract


The difficulty of sentiment classification on this big data can be overcome using deep learning. Before the deep learning training and testing process is carried out, a word features extraction process is needed. Word2Vec as a word features extraction is often used in sentiment classification pre-training because it can capture the semantic meaning of the text by representing a similar vector for each word that has a close meaning. Word2Vec has three parameters that affect the model learning process namely architecture, evaluation method, and dimensions. This study aims to determine the effect of each Word2Vec parameter on deep learning performance in sentiment classification. The accuracy results of the deep learning model were evaluated to determine the effect of the Word2Vec parameter. The results of this study indicate that the three Word2Vec parameters have an influence on the performance of the deep learning model in sentiment classification. The combination of Word2Vec parameters that produces the highest average accuracy include CBOW (Continuous Bag of Word) architecture, Hierarchical Softmax evaluation method, and a dimension of 100. CBOW produces better performance, because it has slightly better accuracy for words that often appear and in this research dataset there are many words that often appear. Hierarchical Softmax shows better results because it uses a binary tree model which makes words that occur rarely will inherit the vector representation above them. The dimension with a value of 100 produces better accuracy because it is in line with the number of datasets of 10,000 reviews.

 


Keywords


word2vec, hierarchical softmax, continuous bag of word, dimensoins, sentiment classification

Full Text:

References


L. Zhang, S. Wang, and B. Liu, “Deep learning for sentiment analysis: A survey,” Wiley Interdiscip. Rev. Data Min. Knowl. Discov., vol. 8, no. 4, 2018, doi: 10.1002/widm.1253.

Q. Huang, R. Chen, X. Zheng, and Z. Dong, “Deep sentiment representation based on CNN and LSTM,” Proc. - 2017 Int. Conf. Green Informatics, ICGI 2017, pp. 30–33, 2017, doi: 10.1109/ICGI.2017.45.

A. Kedia and M. Rasu, Hands-On - Python Natural Language Processing. 2020.

A. Nurdin, B. Anggo Seno Aji, A. Bustamin, and Z. Abidin, “Perbandingan Kinerja Word Embedding Word2Vec, Glove, Dan Fasttext Pada Klasifikasi Teks,” J. Tekno Kompak, vol. 14, no. 2, p. 74, 2020, doi: 10.33365/jtk.v14i2.732.

D. Jatnika, M. A. Bijaksana, and A. A. Suryani, “Word2vec model analysis for semantic similarities in English words,” Procedia Comput. Sci., vol. 157, pp. 160–167, 2019, doi: 10.1016/j.procs.2019.08.153.

D. I. Af’idah, R. Kusumaningrum, and B. Surarso, “Long short term memory convolutional neural network for Indonesian sentiment analysis towards touristic destination reviews,” Proc. - 2020 Int. Semin. Appl. Technol. Inf. Commun. IT Challenges Sustain. Scalability, Secur. Age Digit. Disruption, iSemantic 2020, pp. 630–637, 2020, doi: 10.1109/iSemantic50169.2020.9234210.

T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estimation of word representations in vector space,” 1st Int. Conf. Learn. Represent. ICLR 2013 - Work. Track Proc., pp. 1–12, 2013.

S. H. Kumhar, M. M. Kirmani, J. Sheetlani, and M. Hassan, “Word Embedding Generation for Urdu Language using Word2vec model,” Mater. Today Proc., no. xxxx, 2021, doi: 10.1016/j.matpr.2020.11.766.

C. Zhang, X. Wang, S. Yu, and Y. Wang, “Research on Keyword Extraction of Word2vec Model in Chinese Corpus,” Proc. - 17th IEEE/ACIS Int. Conf. Comput. Inf. Sci. ICIS 2018, pp. 339–343, 2018, doi: 10.1109/ICIS.2018.8466534.

W. Yue and L. Li, “Sentiment Analysis using Word2vec-CNN-BiLSTM Classification,” pp. 3–7.

A. K. Sharma, S. Chaurasia, and D. K. Srivastava, “Sentimental Short Sentences Classification by Using CNN Deep Learning Model with Fine Tuned Word2Vec,” Procedia Comput. Sci., vol. 167, no. 2019, pp. 1139–1147, 2020, doi: 10.1016/j.procs.2020.03.416.

Y. Zhu, E. Yan, and F. Wang, “Semantic relatedness and similarity of biomedical terms: Examining the effects of recency, size, and section of biomedical publications on the performance of word2vec,” BMC Med. Inform. Decis. Mak., vol. 17, no. 1, pp. 1–8, 2017, doi: 10.1186/s12911-017-0498-1.

R. P. Nawangsari, R. Kusumaningrum, and A. Wibowo, “Word2vec for Indonesian sentiment analysis towards hotel reviews: An evaluation study,” Procedia Comput. Sci., vol. 157, pp. 360–366, 2019, doi: 10.1016/j.procs.2019.08.178.

X. Rong, “word2vec Parameter Learning Explained,” pp. 1–21, 2014, [Online]. Available: http://arxiv.org/abs/1411.2738.

Y. Goldberg and O. Levy, “word2vec Explained: deriving Mikolov et al.’s negative-sampling word-embedding method,” no. 2, pp. 1–5, 2014, [Online]. Available: http://arxiv.org/abs/1402.3722.

S. Das, S. Ghosh, S. Bhattacharya, R. Varma, and D. Bhandari, “Critical Dimension of Word2Vec,” Proc. 2nd Int. Conf. Innov. Electron. Signal Process. Commun. IESC 2019, pp. 202–206, 2019, doi: 10.1109/IESPC.2019.8902427.

S. Symeonidis, D. Effrosynidis, and A. Arampatzis, “A comparative evaluation of pre-processing techniques and their interactions for twitter sentiment analysis,” Expert Syst. Appl., vol. 110, pp. 298–310, 2018, doi: 10.1016/j.eswa.2018.06.022.

M. Giménez, J. Palanca, and V. Botti, “Semantic-based padding in convolutional neural networks for improving the performance in natural language processing. A case of study in sentiment analysis,” Neurocomputing, vol. 378, pp. 315–323, 2020, doi: 10.1016/j.neucom.2019.08.096.

K. Smagulova and A. Pappachen, “A survey on LSTM memristive neural network architectures and applications,” vol. 2324, pp. 2313–2324, 2019.

J. Xu, D. Chen, X. Qiu, and X. Huang, “Cached Long Short-Term Memory Neural Networks for Document-Level Sentiment Classification,” 2016.

W. Yin, K. Kann, M. Yu, and H. Schütze, “Comparative Study of CNN and RNN for Natural Language Processing,” 2017.




DOI: https://doi.org/10.30591/jpit.v6i3.3016

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.

JPIT INDEXED BY