Since we`re into all things big data & analytics we are nothing less than tempted to share our thoughts on big data myths. And smash them as hard as we can. This way we`ll feel like MythBusters, at least for a minute.
Machines will replace humans
It is more a philosophical assumption than something else but let’s face it – someone should create the machines, isn’t it? Like it was with the cars. When automobile carved its way on streets there was significant opposition from omnibus companies, coachmen, etc. because they were feared of losing their jobs. False and true at the same time.
The story of our civilization clearly displayed that every machine succeeded to generate benefits, starting with simple dishwasher to complicated automation algorithms actually eliminates work that gives humanity thousand more options and open doors that they didn’t even know existed. Sure, a lot of things are going to get changed but we are way far apart of machines able to completely replace human input.
Big Data is for big business
It is true that the most of the blue chip companies are deep into big data and analytics. It is also true that they are able to afford the big bucks needed for complicated tools & services, computation power as well as data scientists and BI experts needed to turn the output data into actionable insights.
It is also true that the democratization of the big data is already here. A lot of startup companies, just like A4E are going to disrupt the data analytics business model because of targeting on common business problems like sales forecasting. This way, analytics insights aren`t going to be reserved for the private club of the blue chips.
Data analysis is easy
The huge amount of data analytics & visualization tools might trick you out that this is a piece of pie. But this is a myth. Actually, you need a background in both statistics, mathematics, modeling and BI. Last but not least, it is preferable to be experienced in the particular business domain. Achieving degree, expertise, and experience in the above-mentioned fields is everything but not easy.
Big Data predicts the future
Predictive analytics is a big opportunity but it is not really wise to be considered as applicable to all approach. Just like weather forecasting, you can’t get predictions on the weather for a particular day after a half year. Forecasting weather is possible with high accuracy just a week or 10 days ahead.
Any future performance is depending on many variables. On top of this, there are a lot of potential fortuitous events with different scale and impact opportunity. Nobody knows what and when exactly will happen.
It is also true that big data and predictive analytics are capable of forecasting future performance based on a historical string of data. Such data series are key tools in sales forecasting and demand planning.
Data analytics is just a trendy hype
Data analytics generated a lot of attention these days and there is a reason for this. Unbiased data outcomes often give businesses insights that save time, money and workforce. That’s why everyone is pointing at big data & analytics as the painkiller for any business issue. Sometimes it`s for good, sometimes not. So it’s true that modern data analytics is a trendy now.
At the other hand, data analytics is way more than just a hype. It wouldn’t be clever to feel it like the business version of a social media meme. The modern business world is full of data, moreover big data. And data analytics will be a key element in its utilization.
Bigger is better
It is a common myth that as data are larger, the better. Well, our chief data scientist said this is not true because if you fill the data set with too much variables there is a more room for losing the logic. Alexander Efremov, PhD also states that if you get too much observations the performance of data mining is going to drop down.
In the first case an appropriate variables reduction technique is needed and the best case is if a business logic is available, which can assist the process of variables reduction. The latter complicates the data mining workflow, but this is the right way to fight with the wrong statistical conclusions during the modelling. On the other hand when real system behavior is varying in time, many observations means more irrelevant, i.e. not up-to-date data. And finally, even if the investigated system is time invariant, after a certain number of observations, the computation burden, which increases exponentially, makes the modelling process too slow, without sensitive improvement of model accuracy.