3. Web Content Mining
3.1 Introduction to Sentiment Analysis / Opinion Mining
Detection of stances and opinions towards people, companies, and products/services has a tremendous business value: Improving products and services, targeted advertising, revealing trends in election campaigns, …
Sentiment analysis or opinion mining is the computational study of people’s opinions, appraisals, attitudes, and emotions towards. (Entities,individuals,issues,events,topics,and their attributes (aspects))
A general sentiment analysis framework aims to answer
- Who is the opinion holder? -> Opinion holder
- Towards whom or what is opinion/sentiment expressed? -> Target
- What is the polarity and intensity of the opinion?
- Is an opinion associated with a time-span?
3.2 Constructing Sentiment Lexicons
Sentiment clues (opinion words, sentiment-bearing words) – words and phrases used to express some desired or undesired state
Positive clues: good, amazing, beautiful
Negative clues: bad, awful, terrible, poor
Sentiment clues are often domain-dependent => Separate sentiment lexicons need to be constructed for different domains
Example: Quiet speaker phone vs. quiet car engine
3.2.1 Automated acquisition of sentiment lexicons
Automated acquisition of sentiment lexicon is most often semi-supervised (or weakly supervised)
- Start from a small seed lexicon of sentiment words
- Iteratively augment the lexicon based on links between words already in the lexicon and words in the large general lexicon or large corpus
- Stop when there are no more reliable candidate words to be added to the lexicon
Approaches for constructing sentiment lexicons are either Dictionary-based or Corpus-based
Often there is a final step of manual cleansing of automatically derived sentiment lexicons
3.2.1.1 Dictionary-Based Sentiment Lexicon Acquisition
Bootstrapping using a small seed sentiment lexicon. E.g.,10 positive and 10 negative sentiment words
Idea: exploit semantic links between words in the general lexicon E.g.,synonymy and antonymy links in WordNet. The procedure is typically iterative
Additional information can be used to make better lists: WordNet glosses or Machine learning(classification based on concept definitions)
Cons:
- Limited Coverage: they may miss out on nuanced or domain-specific sentiments.
- Lack of Context Understanding: These approaches often treat words in isolation without considering their context.
- Difficulty Handling Negations and Modifiers: Sentiment analysis dictionaries may struggle with handling negations (e.g., “not good”) or modifiers (e.g., “very good”)(Next pag