Bigger is always better, right?
Well, not necessarily. Even in the realm of big data, companies and governments are beginning to see the value in a “less is more” approach. This is actually in stark contrast with things data-driven CEOs like Jeff Bezzos at Amazon believed when he said “We never throw away data”.
In fact, the European Union has recently included this in new laws of the Data Protection Act that will come into effect soon. The act says, “Personal data shall be adequate, relevant and not excessive in relation to the purpose or purposes for which they are processed.”
The Act doesn’t define “adequate, relevant and not excessive,” but in effect it means collecting and holding only the minimum amount of personal data needed to fulfil your purpose. This is part of the practice known as “data minimization.”
What is data minimization?
Data minimization refers to the practice of limiting the collection of personal information to that which is directly relevant and necessary to accomplish a specified purpose.
As companies and organizations began to understand the power of data, and as data becomes more ubiquitous and easy to collect, analysts are faced with a “tsunami” of potential data points. For a time, the impulse was to save all of it — indefinitely.
But as the Internet of Things continues to grow, organizations are faced with more and more ways to collect more and more kinds of data, including and especially private, personally identifiable data.
Some companies may still hope to save it all for some future application, but the dangers of data hoarding are similar to those of physical hoarding: mounds of useless junk that make it very difficult to find what we need when we need it. It costs money and time, and can become dangerous.
Instead of a “save everything” approach, smart data managers are now embracing a data minimization policy, keeping only what’s relevant and necessary. Even Walmart only relies on the previous 4 weeks of data for its day-to-day merchandising strategies.
Benefits of data minimization
I believe companies should only collect and store the data they need — and delete everything else. The value of data decreases very quickly, and storing it “just in case” is a dangerous path.
Data minimization also reduces cost. All data storage costs money, and no business has an infinite budget — so no business can go on collecting and storing data indefinitely.
In addition, too much data (especially personally identifiable data) brings big risks. The consequences of data loss and breaches must be considered, too. A major leak of sensitive personal information can easily destroy a business or even lead to charges of criminal negligence. Imagine how much more galling it would be to fall foul of this when you didn’t even need the data that you lost in the first place!
With the implementation of the Data Protection Act, all businesses that hold data about any European Union citizen will need to make data minimization standard operating procedure to minimize risk. But rather than a tedious new requirement, it should be a benefit for both the company and the individuals it is intended to protect.
DSC Resources
- Career: Training | Books | Cheat Sheet | Apprenticeship | Certification | Salary Surveys | Jobs
- Knowledge: Research | Competitions | Webinars | Our Book | Members Only | Search DSC
- Buzz: Business News | Announcements | Events | RSS Feeds
- Misc: Top Links | Code Snippets | External Resources | Best Blogs | Subscribe | For Bloggers
Additional Reading
- What statisticians think about data scientists
- Data Science Compared to 16 Analytic Disciplines
- 10 types of data scientists
- 91 job interview questions for data scientists
- 50 Questions to Test True Data Science Knowledge
- 24 Uses of Statistical Modeling
- 21 data science systems used by Amazon to operate its business
- Top 20 Big Data Experts to Follow (Includes Scoring Algorithm)
- 5 Data Science Leaders Share their Predictions for 2016 and Beyond
- 50 Articles about Hadoop and Related Topics
- 10 Modern Statistical Concepts Discovered by Data Scientists
- Top data science keywords on DSC
- 4 easy steps to becoming a data scientist
- 22 tips for better data science
- How to detect spurious correlations, and how to find the real ones
- 17 short tutorials all data scientists should read (and practice)
- High versus low-level data science
Follow us on Twitter: @DataScienceCtrl | @AnalyticBridge