Automated analysis of product features in online reviews – Part 2

Two weeks ago, I already briefly presented the topic of my bachelor thesis. And as I can guess, you all have been eagerly waiting for the results of my thesis. Well, here you are:

The following is an example of some of the questions and results of the analysis part of the thesis.

Which N-grams are most often marked as Aspect in each category?

It can be seen that the aspects are strongly dependent on the product category. However, semantically very similar N-grams such as battery/battery life or works/working can also be seen in the product context.

Which aspects are “universal”, i.e. frequently represented in all categories?

By sorting the extracted aspects according to their average rank across product groups, aspects can be found that are frequent regardless of the product category. The most visible aspects are “general” aspects such as price, quality, service, or shipping.

What clusters are there in the “Cell Phones” category?

Clustering with K-Means and a cluster number of 30 results in a distribution that can be interpreted as an approximate grouping of aspects. It can be seen, for example, that camera and pictures or battery and battery life are in one cluster due to their high semantic similarity.

Cell Phones: How positive or negative are the sentiment labels to a cluster on average? How strong is the influence of the respective cluster sentiment on the star rating of the review? (between -1 and +1 in each case)

The sentiment labels predicted by the model (positive, negative, neutral, …) can also be represented numerically (as -1, 0, 1, …). Thus, a mean value of the sentiment to a cluster can be calculated afterwards. In the case of cell phones, for example, it is visible that a mention of the “warranty” cluster is usually accompanied by a negative sentiment. On the other hand, “Price & Quality”, for example, tends to have a positive sentiment.

In addition, it is possible to examine the connection (correlation) between this numerical sentiment mean and the star rating of the review. For example, it can be seen that the sentiment on the “functional capability” cluster has a strong influence on the star rating. If the sentiment on “Functionality” is high/low, the star rating is usually also high/low). In the case of “Guarantee”, on the other hand, it can be seen that there is a less strong correlation with the rating.

Plush Figures: Which adjectives are indications of particularly polar sentiments about each cluster? (“Opinion Words”)

By automatically recognizing word types using part-of-speech tagging and then examining the mean cluster sentiment in sentences grouped by the adjectives that occur in them, it is possible to identify adjectives that are indicators of a particularly positive or negative sentiment about a cluster. For example, in the cluster “colors/light” the adjectives bright or vibrant can be identified as indicators for a positive and same or black for a negative cluster sentiment.

Conclusion of my bachelor thesis:

The application of text mining to extract structured data from natural language sources such as online reviews remains an exciting topic that is finding more and more its way into applied systems. Entrepreneurial domains such as market research or research and development can benefit from the powerful systems that have emerged in recent years to perform accurate automated extraction of information from very large data sets, use it to gain insights into feedback and desires of their customers, and make informed and data-driven decisions.

Image by Duncan Meyer on Unsplash

Machine translation for all (medical) cases – project report

The MT train in our pharmaceutical customer project has really taken off! After a scoping phase followed by the selection of the appropriate system for the customer's use cases, everything now revolves around the integration and training of MT engines. Current and future MT engines are of interest to various customer stakeholders, so the topic of MT testing plays a major role even after system selection. Here we give you an insight into the MT processes supported by blc at the customer.

Don’t stumble blindly into machine translation!

ByKerstin Berns

February 18, 2021

Machine translation (MT) seems ingenious: fantastic translation results, available to all thanks to neural MT - fast, uncomplicated and seemingly free of charge! All good reasons why companies are rushing to embrace machine translation (MT). Now is the time to form a solid strategy on how, why and with what benefit they want to use machine translation!