When it comes to Data Science, the most recurring topic is modeling. Quite a few articles out there talk about data preparation and only a bunch about how to communicate ...
Do the usual (attending data camps if you don’t have any experience), and you will go nowhere due to competition doing the exact same thing as you. Do the unusual, ...
Summary: There are several approaches to reducing the cost of training data for AI, one of which is to get it for free. Here are some excellent sources. Recently we w...
Python and R are the two most commonly used languages for data science today. They are both fully open source products and completely free to use and modify as required ...
In the twentieth century, oil was the most valuable resource – but not anymore. In today’s digital age data is the new oil. It will play a similar, perhaps bigger rol...
After my last blog on the use of relational databases PostgreSQL and MonetDB to help compensate for R’s RAM limitations, I received an email from a reader who ask...
When the first release of Spark became available in 2014, Hadoop had already enjoyed several years of growth since 2009 onwards in the commercial space. Although Hadoop s...
A few days ago, while discussing with colleagues why many organizations were still fumbling at what’s typically called the “last mile” in data science, the conversa...
Knowledge is power With IDC estimating that the data mountain has now reached five zettabytes, it is not a case of a business not having enough data to make business deci...
We all know that Data does not lie. However, when we create visualizations we may at times “stretch the truth.” This is often done to help others realize the true sto...