A project from the Human-Centered Data Science lab led by Dr. Cecilia Aragon.
In this project, we applied our trained machine learning classifiers to the entire emotion-based reviews dataset from Fanfiction.net, which covers over 170 million of text reviews. Moreover, we came up with data visualizations for this result to make it more accessible to other people.
Online communities and social medias have been popular places for people to make interpersonal connections and finding social support. We found that fanfiction.net is one of the largest repositories of online fanfiction in the world and with large number of young adults write and comment on fanfiction stories every day. Previous research has identified those reviews contain different emotions that helped teenagers exploring their own identities and foster a collaborative and supportive learning environment.
We want to explore the role of positive feelings in forming connections between people. In this part of research, we focus on executing and examining our machine learning classifiers on positive emotions, and design data visualizations to show the result.
We first qualitatively coded text data to train our machine learning classifiers. After we examined our classifiers, we come up with data visualizations for the dataset.
We conducted qualitative coding using Text Prizm, a tool developed by the Human-Centered Data Science Lab that help researchers to conduct collaborative qualitative analysis on large social media data.
We code the reviews based on emotions, valence, and context.
The emotions available in the Text Prizm is developed based on the grounded theory approach in building taxonomy of emotion codes.
There are 11 emotion codes available for us so far: Like, Joy / Happiness, Anticipation / Hope, Surprise, Sadness, Confused, Dislike, Disturbed / Disgust, Anger / Frustration, No emotion, Unknown.
Our machine learning classifiers were designed for positive emotions which are "Like", "Joy / Happiness", "Anticipation / Hope". After we examine they performs well using our text dataset containing 11292 reviews (with F scores over 80%), each member designed a data visualization for the result.
I designed a Chord diagram to visualize the result. I mapped each arc to an emotion, and each chord indicates two shared emotions for a review in our data. The chord diagram gives users insight into the relationship between pairs of emotions in our dataset.
When users hover over different cords, they can see the number of instances with two different emotions. The chord goes into itself indicate the number of reviews contains only a certain emotion.
Based on the data visualization of the text dataset using chord diagrams, users can obtain more insight such as correlation between different emotions, and number of text reviews containing certain emotions, and contrast of the text sizes.
I also applied word zones designed by professor Marti Hearst on our emotion dataset. Word zones are semantically grouped word clouds. I generated word zones that are categorized by fandom characters, verbs by emotions, and all words by emotions based on word frequencies in our dataset.
This word zone is categorized by works which shows the top three most popular works with five most frequently mentioned characters.
We can know that Harry Potter 's characters appears in the reviews most frequently with "Harry", "Hermione". and "Draco" being mentioned most of the times.
This word zone shows the top five most frequent verbs corresponding to three positive emotions we explored.
we can know that the word "LOVE" appears most frequently in "Joy / Happiness", and "update" and "hope" mostly stand for "Anticipation and Hope" in reviews.
This word zone shows the top five most frequent words including nouns, verbs, and props, etc. corresponding to three positive emotions.
The size of words are also ordered by their frequencies in our dataset
Through coding one of the largest existing text corpora, understanding corresponding emotions, and prototyping data visualizations, I realized data driven opportunities can foster connections and empathy between individuals and create more possibilities in design. Moreover, the insights from this research can help the formation and sustenance of interpersonal connections online and maintain the supportive environment for teenagers.