Twitter Sentiment Analysis



Understanding user sentiment to aid mental health diagnosis


August 1st, 2016


#engineering #machine-learning #artificial-intelligence

Motivation

  • The United States, in particular, features extremely high costs for healthcare
  • Public awareness and support mental health care is increasing

Solution created

  • Pipeline to gather tweets on two polar topics to understand users’ sentiment towards them
  • Demonstration that identifies those users who use depression-indicative language
  • Useful to mental health professionals to identify long-term trends in user’s mental health

Pipeline overview

  1. Data collection: Collected nearly 4,000 tweets from the Twitter Developer API and labelled them based on hashtags present. For example, tweets containing ā€œdepressedā€ (or related hashtags) will be labelled as belonging to the ā€œdepressive-indicativeā€ class; tweets containing ā€œhappyā€ (or related hashtags) will be labelled as part of the ā€œnon-depressive-indicativeā€ class.
  2. Understand the user’s position in the Twitter community: Call the Twitter API to gain information about the user’s followers, followees, average retweet counts, and more.
  3. Data analysis: Send each of the 4,000 tweets through IBM Watson’s Tone Analyzer API to gain more dimensions of sentiment information about each tweet.
  4. Classification model: Use the labelled data to discriminate between tweets that are ā€œdepressive-indicativeā€ or not in terms of their language characteristics. Trained classification model with scikit-learn’s k-Nearest Neighbors implementation.
  5. Classify an unknown user: Given an unknown user, generate visualizations and an overall classification of their Twitter tweet language.

Technologies used

  • Python + Django web framework
  • scikit-learn & IBM Watson intelligence APIs
  • chart.js & Material Bootstrap

This site is open source. Improve this page Ā»