Suggestions & recommendations are products of data analytics and forecasting. This is how they made it

Every online store is eager to offer you something more based on the interest you show to one or another product. Typically we are giving a brief looking at the recommendation and often we find it useful and sometimes we even take advantage of it.

How they made it?

By data analytics, especially text mining.

This term describes a process when a natural language is a subject of machined content analytics. It is commonly used and broadly applicable.

Let’s take a book store for instance. You have huge amount of books and the particular recommendations you see under any offering are based on few components. First one is the author – if you like some of John Grisham`s titles, it is highly probable that you`ll be interested in the rest of his books. But even you are deep in the lawyer thrillers genre, suggesting just a single author is way too much.

So, we are going deeper in to genres. Physics, criminal thrillers, DYI, sci-fi, etc. are among the possible options. Genres are key element of data preparation, respectively categorization. But you shouldn’t lock a user in particular genre. Consumer interest are far broader than single author or a genre.

This is where text mining comes to help.

Every single book has its own synopsis or description, if you like. Text mining captures this content and eliminates all useless words like junctions, pronouns, etc., until it keeps only valuable pieces of data. It is easy for you, but computers are far of humans and this is a pretty difficult, yet not impossible, task for them. But they are much better than us in proper calculations so it is fast and easy to count particular terms, their density in particular content and to establish a link between one and another text block.

This is achievable via Latent Semantic Analysis (LSA). It is a natural language processing technique isolating sets of similar documents or similar terms. The idea behind LSA lies on the assumption that it is more likely similar terms in meaning to occur in similar documents. First the so called document-term matrix is constructed, which contains the term frequencies (columns) per document (rows). After that it is decomposed into three specific matrices by Singular Value Decomposition (SVD). A significance analysis is performed in order to reduce their sizes. Finally the documents are compared by calculating a specific correlation between the rows of the first matrix. In fact this correlation coefficient corresponds to the cosine of the angle between the investigated vectors. Hence values close to 1 correspond to similar documents and when the correlation is close to 0 or is negative, the documents are dissimilar or even opposite (w.r.t. the occurred terms in them).

Let`s get back to John Grisham and his lawyer thrillers. This is how data analytics in the website of Barnes & Noble for instance is recommending to check out not just some other of his titles but also books by David Baldacci, Michael Connelly, James Patterson, Lee Child and so on.

Cross this text mining product with the data of previous purchases, make it in line with real people`s behavior and you have analytics forecasting at its best.

Need help with data analytics, text mining, sales forecasting and demand prediction? Drop us a line at our webiste:

Leave a Reply

Your email address will not be published. Required fields are marked *