Negation Handling in Sentiment Analysis in Spanish texts
Transcripción
Negation Handling in Sentiment Analysis in Spanish texts
Negation Handling in Sentiment Analysis in Spanish texts. Samara Gretel Villalba Osornio Dr. Luis Villaseñor Pineda & Dr. Manuel Montes y Gómez Computer Science Department Instituto Nacional de Astrofísica Óptica y Electrónica Introduction Proposal Methodology: 1) Compute words weight. Image 1. Overview of sentiment analysis. The business world is interested in the opinions of the community because they represent possible market predictions . Sentiment analysis represents a challenge to the scientific community because the documents contain a lot of linguistic phenomena and informal expressions . The negation is a very common linguistic phenomenon . The negation is a complex phenomenon that alters the truth value of an expression. – Ex.”the movie is not boring” The traditional approach of text classification (v. gr. bag of words) can not handle the negation. Where: - F is the frequency of the word n in class C. Corpus details The Corpus of Movie Reviews (CMR) is a Spanish corpus containing 3878 movie reviews obtained from the website MuchoCine . The reviews are rated from 1 to 5, where 1 is a bad movie and 5 an excellent movie. The opinions rated with 1 or 2 are negative opinions. The opinions with 4 or 5 are positive opinions. In the experiments we used 1270 documents for each class. 2) Identify negative particles: no, sin, ni, nada, nunca, tampoco. 3) Define the negation scope. Table 1. Appearance of negative particles in the corpus. Results Related Work Algorithm 1. Determining the negation scope. The document is represented by: Where: – X are the normal words of the document. – Y are the negated words. 4) Modify the polarity: mirror effect. Pretty: Positive probability = 0.7 Negative probability = 0.3 No_Pretty: Positive probability = 0.3 Negative probability = 0.7 5) Apply the clasification methods We used the Multinomial Naive Bayes and proposed two variations to treat the negation. Multinomial Naive Bayes (MNB) Diagram 1. Taxonomy of the state of the art. Problem Statement The problem of negation in SA for Spanish reviews includes: To identify the negative cues (words, prefixes, suffixes). To define if all negative cues have the same effect. To determine the negation scope. To define the type of change that negation cause. To discover if the use of negation is the same in several domains. When using traditional approaches of text classification, the order of the words is lost and the possibility of correctly defining the negation effect too. Multinomial Naive Bayes Variation 1 (MNB-1) Graphic 1. Results of the clasification with the Multinomial Naïve Bayes variations. The results show that the baseline was improved using a supervised approach in the clasification. Treatment of the negation did not have significant changes in the accuracy. The reason is that the mirror effect is not representing correctly the distribution and values of the words that appears in negated contexts. Future Work In the future work the algorithm presented for negation handling will be applied in the training set to obtain a vocabulary that includes words with the prefix “no_” indicating that was affected by a negation. The words with the mark will be treated in the same way of the rest of the features, calculating their probabilities according to their frequencies in the documents. References Using the representation obtained in the third step . Multinomial Naive Bayes Variation 2 (MNB-2) Regardless of words that occurred in negative contexts. 1.- Pang, B., Lee, L., & Vaithyanathan, S. (2002, July). Thumbs up?: sentiment classification using machine learning techniques. In Proceedings of the ACL-02 conference on Empirical methods in natural language processing-Volume 10 (pp. 79-86). Association for Computational Linguistics. 2.-Taboada, M., Brooke, J., Tofiloski, M., Voll, K., & Stede, M. (2011). Lexiconbased methods for sentiment analysis. Computational linguistics, 37(2), 267-307. 3.-Jiménez Zafra, S. M., Martínez Cámara, E., Mart ın Valdivia, M. T., & Molina González, M. D. (2015). Tratamiento de la Negación en Análisis de Opiniones en Español. Procesamiento del Lenguaje Natural,54, 367-44. 4.-Narayanan, V., Arora, I., & Bhatia, A. (2013). Fast and accurate sentiment classification using an enhanced Naive Bayes model. In Intelligent Data Engineering and Automated Learning–IDEAL 2013 (pp. 194-201). Springer Berlin Heidelberg 5.- Cruz, F. L., Troyano, J. A., Enriquez, F., & Ortega, J. (2008). Clasificación de documentos basada en la opinión: experimentos con un corpus de crıticas de cine en español. Procesamiento de Lenguaje Natural, 41. Coordinación de Ciencias Computacionales Instituto Nacional de Astrofísica Óptica y Electrónica, Luis Enrique Erro # 1, Tonantzintla, Puebla, México Apartado Postal 51 y 216, 72000 Tel: (222) 266-31-00