As we move into 2018, Analytical Data Infrastructure (ADI) is becoming a significant topic in business intelligence and analytics. Where Big Data was once an over-hyped, catch-all term, in the coming year we will see organisations move to a place where business-oriented ‘data strategies’ are the major focus. With that shift comes the need for sophisticated, yet easy to use, data science approaches that deliver results back to the business.
It is a point backed up by the 2018 Global Dresner Market Study for Analytical Data Infrastructure (ADI). The highly-regarded report revealed the key priorities for businesses for their data analytics and business intelligence efforts. From deployment to loading priorities, data preparation, modelling and management of data associated with ADI, the study captured the most important and current market trends driving the intelligent adoption of Data Science.
The Dresner report explored the ways in which end-users are planning to invest in ADI technology in the year ahead, along with the considerations behind implementation and use-cases. While security and performance were listed as the top two priorities for businesses, an interesting finding was that the biggest year-on-year change was the growing importance of easy access to and use of analytical features and programming languages such as the use of R, Machine Learning technology and MapReduce analytics.
Businesses have woken up to the fact that there is value in their data. With the right tools, they can extract that value – tapping into insight to improve the way they sell to their customers, or to streamline business processes and reduce costs.
But often, data has to be extracted, cleansed and transferred to other systems. In most companies, the Business Intelligence competence centres are separate teams to the Data Science teams, and they rarely work closely together. Modern analytics platforms combine these two worlds and allow to do SQL-based data analytics, Map Reduce algorithms and data science languages such as R or Python side by side. Many database vendors offer such capabilities, and some have even integrated these languages tightly into databases, allowing organisations to run data science on huge data sets.
While cleansing the data and finding the right models is a repetitive task that is sufficient to run on smaller data sets, high performance in-memory computing can make a vast difference when applying created R or Python models to billions of user data, in near-real-time.
Letting analysts use the data science tools of choice
Data analysts have their favoured analytics and visualisation tools which either leads to a wide spread of different tools that have to be integrated and maintained in the data management eco system, or to people not cooperating with each other. Further, the actual data science scripting language is often a personal preference. Each language has its own strengths and weaknesses in relation to the complexity of the task or features that the language offers.
As we move from an era of descriptive (looking at past trends), to predictive (looking to the future) and even to prescriptive (finding the best course of action to meet key performance indicators) for the most advanced analysis, the combination of AI and standard SQL analytics can create more agility and efficiency in finding the right insights out of data.
The good news is that there are platforms that combine any data science language within the same system, and combine it with standard database technologies. Exasol version 6.0 has for instance an open-sourced integration framework that allows to install any programming language and use it directly inside the SQL database. Pre-shipped languages are R, Python, Java and Lua, but you can also create containers for Julia, Scala, C++ or your choice.
Did you ever think it would be possible to provide normal SQL analysts access to data science results? Or that it would be possible to conduct powerful data processing in SQL rather than the programming languages? This leads to more flexibility, but essentially to exceptional performance.
Technologies has to follow your strategy
It will be interesting to see how data science technology evolves over time, and how companies move to leverage all possible ways of creating insights, predictions and automated prescriptions out of all kinds of data. This is not just a question of people’s skill sets or certain algorithms, but also the right architecture for your data eco system. It should facilitate data storage, standard reporting and data processing, artificial intelligence and a flexible way of adjusting to future trends in an open, extensible platform.
The technology should be available in the necessary ways – from a free downloadable solution to let developers play around on their laptops, to high-performance on-premise systems in your secured data center up to the standard public cloud platforms such as Amazon, Azure or Google. The technology should follow your data strategy, not the other way round.