Automated analysis of product features in online reviews – Part 1

Large providers such as Amazon and Google often use online reviews in text form. In these, users can express their opinion or experience of a product or service. There are now many such reviews and the structure of a review is entirely up to the user. In my bachelor thesis, I took a closer look at a subarea of this research, the Aspect Based Sentiment Analysis. Read more about this in my blog…

As an example, I carried out an automated analysis on a large scale. Aspect Based Sentiment Analysis is about automatically extracting product features (Aspects) mentioned in online reviews and the sentiment towards them.

Training of various models

I trained various models (artificial neural networks) for this task using machine learning. A particular focus here was on so-called Transformer models. These are models with a certain type of deep learning model architecture. This was first introduced in 2017 by Google’s deep learning division Google Brain and has become increasingly relevant in the field of natural language processing (NLP) in recent years. In many areas, these models set the state of the art, meaning they are among the most powerful models. In particular, there are various pre-trained models, i.e. models that have been pre-trained on a large amount of data and one or more NLP tasks. For the subsequent learning of a new task, only a fine-tuning is then necessary (“transfer learning”). This allows the already learned abstract representations of natural language to be used to learn the new task more effectively, instead of having to train a new neural network from scratch.

Overview of the procedure:

For my work, I used online reviews from the online retailer First, five product categories were selected and several hundred reviews from each of these were manually annotated according to the described Aspect-Based-Sentiment-Analysis scheme. Subsequently, various models were trained and evaluated on this dataset. The best-performing models were then used to automatically label aspects and sentiments on a large corpus (about 400,000 reviews). I then clustered (grouped) the Aspects predicted by the models using different approaches and then analyzed them.

And which results and especially which conclusion I could draw from it, will be published in two weeks. So stay tuned!

Image by Towfiqu barbhuiya on Unsplash

Related Posts