This study analyzes how textual metrics such as Sentiment Polarity, Review Length, and Readability Score affect user engagement in Amazon Movie Reviews.
Online consumer reviews are a cornerstone of modern decision-making, especially on platforms like Amazon and IMDb, where millions of reviews guide choices for
movies and TV shows. These platforms often include helpfulness voting mechanisms, allowing users to indicate a review’s usefulness. Despite their importance,
the factors driving perceived helpfulness—such as review length, sentiment, and readability—remain underexplored. This study focuses on the movie and TV show
domain to uncover how these elements influence engagement, building on prior research suggesting that emotional intensity, review depth, and clarity play key
roles.
Understanding these dynamics is vital for optimizing review visibility, enhancing consumer decisions, and empowering content creators to write impactful reviews.
It also contributes to broader insights into digital consumer behavior and natural language processing.
What factors influence the likelihood of receiving a higher overall voting/perceived helpfulness in movie and TV show reviews?
These hypotheses are grounded in prior studies suggesting that extreme sentiments engage readers emotionally, while balanced length and readability optimize informativeness and accessibility.
The study uses the Amazon Review/Product Dataset from UCSD, spanning May 1996 to October 2018, with 233.1 million reviews across categories. This project focuses on the "Movies and TV Shows" subset, pre-processed to include 10,419 reviews after filtering out rows with missing votes and extreme lengths (under 10 or over 1024 words).
This study employs a comprehensive pipeline: Creating corpus and summarization, Tokenization and data pre-processing, Creating DFM and Feature Co-occurrence matrix, Wordclouds, Readability Scoring, Review Length, Sentiment Analysis, Topic Modelling using Latent Dirichlet Allocation (LDA), and Negative Binomial regression to analyze review helpfulness.
quanteda and tidytext.textstat_readability.SentimentAnalysis,
and compared it with liwcalike.Due to overdispersion in vote counts (mean: 7.97, variance: 513.12), a Negative Binomial regression model was used to evaluate the effects of length, polarity, and readability on votes.
Below are visualizations illustrating key insights from the analysis.
Figure 1: Distribution of Review Lengths shows most reviews fall between 88 and 310 words, with a median of 171, supporting the hypothesis of an optimal length range.
Figure 2: Mean Vote by Sentiment reveals positive (8.32) and negative (6.8) reviews receive higher votes than neutral ones (5.03), aligning with H3.
Figure 3: Document-Feature Matrix (DFM) and Feature Co-occurrence Matrix (FCM) illustrate word frequency and relationships, highlighting common terms like "film," "movie," and "good" that drive review content, while words like "first," "look," or "little," have fewer connections.
Figure 4: Topic Modelling with LDA reveals seven distinct topics in reviews (e.g., action films, horror, family stories), showing thematic diversity that may influence perceived helpfulness.
These findings validate H1 and H2 partially (curvilinear trends observed) and H3 fully (extreme sentiments drive engagement).
This study reveals that review length and readability significantly influence helpfulness votes, with optimal ranges enhancing engagement, while extreme sentiments (positive or negative) consistently outperform neutral ones. These insights can guide reviewers and platforms in crafting and prioritizing impactful content.
1. Dashtipour, K., et al. (2021). Sentiment analysis of Persian movie reviews using deep learning. Entropy, 23(5), 596.
2. Qaisar, S. M. (2020). Sentiment analysis of IMDb movie reviews using LSTM. ICCIS, 1-4.
3. Kumar, S., et al. (2020). Movie recommendation system using sentiment analysis. IEEE Transactions on Computational Social Systems, 7(4), 915-923.
4. Daeli, N. O. F., & Adiwijaya, A. (2020). Sentiment analysis on movie reviews using Information Gain and KNN. Journal of Data Science and Its Applications, 3(1), 1-7.