Negation Handling in Sentiment Analysis in Spanish texts

Transcripción

Negation Handling in Sentiment Analysis in Spanish texts
Negation Handling in Sentiment Analysis in Spanish texts.
Samara Gretel Villalba Osornio
Dr. Luis Villaseñor Pineda & Dr. Manuel Montes y Gómez
Computer Science Department
Instituto Nacional de Astrofísica Óptica y Electrónica
Introduction
Proposal
Methodology:
1) Compute words weight.
Image 1. Overview of sentiment analysis.
The business world is interested in the
opinions of the community because they
represent possible market predictions .
Sentiment analysis represents a challenge to
the scientific community because the
documents contain a lot of linguistic
phenomena and informal expressions .
The negation is a very common linguistic
phenomenon .
The negation is a complex phenomenon that
alters the truth value of an expression.
– Ex.”the movie is not boring”
The traditional approach of text classification
(v. gr. bag of words) can not handle the
negation.
Where:
- F is the frequency of the word n in class C.
Corpus details
The Corpus of Movie Reviews (CMR) is a
Spanish corpus containing 3878 movie reviews
obtained from the website MuchoCine . The
reviews are rated from 1 to 5, where 1 is a bad
movie and 5 an excellent movie. The opinions
rated with 1 or 2 are negative opinions. The
opinions with 4 or 5 are positive opinions. In
the experiments we used 1270 documents for
each class.
2) Identify negative particles: no, sin, ni, nada,
nunca, tampoco.
3) Define the negation scope.
Table 1. Appearance of negative particles in the
corpus.
Results
Related Work
Algorithm 1. Determining the negation scope.
The document is represented by:
Where:
– X are the normal words of the document.
– Y are the negated words.
4) Modify the polarity: mirror effect.
Pretty:
Positive probability = 0.7
Negative probability = 0.3
No_Pretty:
Positive probability = 0.3
Negative probability = 0.7
5) Apply the clasification methods
We used the Multinomial Naive Bayes and
proposed two variations to treat the negation.
Multinomial Naive Bayes (MNB)
Diagram 1. Taxonomy of the state of the art.
Problem Statement
The problem of negation in SA for Spanish
reviews includes:
To identify the negative cues (words,
prefixes, suffixes).
To define if all negative cues have the
same effect.
To determine the negation scope.
To define the type of change that negation
cause.
To discover if the use of negation is the
same in several domains.
When using traditional approaches of text
classification, the order of the words is lost and
the possibility of correctly defining the negation
effect too.
Multinomial Naive Bayes Variation 1 (MNB-1)
Graphic 1. Results of the clasification with the
Multinomial Naïve Bayes variations.
The results show that the baseline was
improved using a supervised approach in the
clasification. Treatment of the negation did not
have significant changes in the accuracy. The
reason is that the mirror effect is not
representing correctly the distribution and
values of the words that appears in negated
contexts.
Future Work
In the future work the algorithm presented for
negation handling will be applied in the training
set to obtain a vocabulary that includes words
with the prefix “no_” indicating that was
affected by a negation. The words with the
mark will be treated in the same way of the
rest of the features, calculating their
probabilities according to their frequencies in
the documents.
References
Using the representation obtained in the third step .
Multinomial Naive Bayes Variation 2 (MNB-2)
Regardless of words that occurred in negative
contexts.
1.- Pang, B., Lee, L., & Vaithyanathan, S. (2002, July). Thumbs up?: sentiment
classification using machine learning techniques. In Proceedings of the ACL-02
conference on Empirical methods in natural language processing-Volume 10 (pp.
79-86). Association for Computational Linguistics.
2.-Taboada, M., Brooke, J., Tofiloski, M., Voll, K., & Stede, M. (2011). Lexiconbased methods for sentiment analysis. Computational linguistics, 37(2), 267-307.
3.-Jiménez Zafra, S. M., Martínez Cámara, E., Mart ın Valdivia, M. T., & Molina
González, M. D. (2015). Tratamiento de la Negación en Análisis de Opiniones en
Español. Procesamiento del Lenguaje Natural,54, 367-44.
4.-Narayanan, V., Arora, I., & Bhatia, A. (2013). Fast and accurate sentiment classification using an enhanced Naive Bayes model. In Intelligent Data Engineering and
Automated Learning–IDEAL 2013 (pp. 194-201). Springer Berlin Heidelberg
5.- Cruz, F. L., Troyano, J. A., Enriquez, F., & Ortega, J. (2008). Clasificación de
documentos basada en la opinión: experimentos con un corpus de crıticas de cine
en español. Procesamiento de Lenguaje Natural, 41.
Coordinación de Ciencias Computacionales
Instituto Nacional de Astrofísica Óptica y Electrónica, Luis Enrique Erro # 1, Tonantzintla, Puebla, México
Apartado Postal 51 y 216, 72000 Tel: (222) 266-31-00