Today more than even, every business is focusing on collecting the data and applying analytics to be competitive. Big Data Analytics has passed the hype stage and has become the essential part of business plans.
Data Lake is the latest buzzword for dumping every element of data you can find internally or externally. If you Google the term data lake, you will get more than 14 million results. With entry of Hadoop, everyone wants to dump their siloes of data warehouses, data marts and create data lake.
The idea behind a data lake is to have one central platform to store and analyze every kind of data relevant to the enterprise. With the digital transformation, the data generated every day has multiplied by several times and business are collecting this consumer data,Internet of Things data and other data for further analysis.
As the storage has become cheaper, more data is being stored in its raw format in the hopes of finding nuggets of information but eventually it becomes difficult. It is like using your smartphone to click photographs left, right and center, but when you want to show some specific photograph to someone it’s very difficult.
Data Lakes, if not maintained properly, have the potential to grow aimlessly consuming all the budget. Some companies have their data lakes overflowing on premise systems into the cloud.
Most data lakes lack governance, lack the tools and skills to handle large volumes of disparate data, and many lack a compelling business case. But, this water (the data) from your data lake has to be crystal clear and drinkable, else it will become a swamp.
Before getting into bandwagon of creating the data lake that may cost thousands of dollars and months to implement, you should start asking these questions.
.
- What data we want to store in Data Lake?
- How much data to be stored?
- How will we access this massive amounts of data and get value from it easily?
.
Here are some guidelines to avoid drowning into data lakes.
.
- First and foremost – create one or more business use cases that lay out exactly what will be done with the data that gets collected. With that exercise you will avoid dumping data, which is meaningless.
- Determine the Returns you want to get out of Data Lake. Developing a data lake is not a casual thing. You need good business benefits coming out of it.
- Make sure your overall big data and analytics initiatives are designed to exploit the data lake fully & help achieve business goals
- Instead of getting into vendor traps and their buzzwords, focus on your needs, and determine the best way to get there.
- Deliver the data to wide audience to check and revert with feedback while creating value
.
There are many cloud vendors to help you out building data lakes – Microsoft Azure, Amazon S3 etc. By making data available to Data Scientists & anyone who needs it, for as long as they need it, data lakes are a powerful lever for innovation and disruption across industries.