Data protection problems, principles and identity solutions

CEO Dave McComb, President of Semantic Arts, noted during a talk in 2021 that one of his banking clients had customer US Social Security Numbers (unique government IDs banks typically use to authenticate the customer’s identity and control customer access to the system) stored in over 8,000 different places.

It was not unusual for this personally identifiable information (PII) to be in duplicate form in different places in a bank’s databases, he said. What was surprising was the sheer number of duplicates in this case. This very high number was clearly a red flag indicating that the bank had long ago lost track of its data.

PII that companies lose track of reflects the left hand does not know what the right hand is doing in an organization. Data that’s oversiloed, disconnected and not designed to be discoverable eventually becomes just a mostly abandoned mountain of risk, rather than information. Consultants like Dave know lots of companies collecting PII who, even though they’re trying to comply with data protection regulations, still have these mountains of risk.

The data protection problem

The rapid proliferation of data in all its forms and manifestations has become a key concern for public sector organizations responsible for public welfare and safety.

Part of the problem is on the data user side of things. Let’s take enterprise customers to begin with. These days, large organizations subscribe to literally thousands of different end user software-as-a-service (SaaS) or other more developer-oriented cloud services (XaaSes).

Similarly, each consumer may use dozens of applications in their daily lives, and may have hundreds they’ve installed on their smartphones. Each app, of course, tends to ask users upon installation to enter personal data such as email address or mobile phone number. Each user, in essence, is doing the provider’s data entry for them in return for access privileges to the provider’s apps.

What does this add up to? Scads of PII that don’t need to be in motion or duplicated more than absolutely necessary.

The other part of the problem is on the data custody side of things. Enterprises have made a habit of storing correlatable or personally identifiable information (PII) in centralized repositories that obviously attract identity thieves.

Remember all the data breaches of centralized repositories brimming with correlatable personal data we’ve seen over the years. In 2021, 1,871 different organizations in the US suffered data breaches, according to the Identity Theft Resource Center. That’s up from 1,108 in 2020. No wonder data regulation is on the rise.

What’s behind these regulatory changes?

The European Union was thoughtful and well-intended when it crafted the General Data Protection Regulation (GDPR). Previous personal data protection guidance was merely that–guidance. GDPR, by contrast, includes substantial fines for non-compliance.

Current violators of the GDPR face a fine of up to €20 million, or four percent of the company’s annual worldwide turnover for the preceding year, whichever amount is higher.

One of the biggest violators to date has been Amazon. Amazon announced in its June 2021 annual report that the company was fined €746 million (over $801 million). Why? Amazon was apparently installing cookies on user devices without users’ say so.

If you’ve visited a lot of corporate websites recently, you’ve surely noticed the cookie permission requests that appear the first time you arrive at the site, or the first time after you’ve dumped your cookies.

The seven principles of the GDPR, according to compliance consultancy OneTrust, are wide ranging:

Lawfulness, fairness and transparency
Purpose limitation
Data minimisation
Accuracy
Storage limitation
Integrity and confidentiality (security)
Accountability

Together, these seven principles constitute the EU’s demand that organizations exchanging personal data related to inhabitants of EU countries 1) collect personal data only when necessary and 2) manage that personal data while its essential to minimize inaccuracies, duplication, and vulnerability to tampering, theft or other misuse. Those who violate the regulation will be held accountable.

Critics of the GDPR maintain that it’s too broad and vague. But the alternative would be regulation that isn’t comprehensive. With the GDPR, the EU in essence pushes organizations to treat personal data much more carefully than they have before, for good reason.

Looking forward, what’s the best, most cost-effective way to comply with not only the tenets, but the principles behind GDPR? Fix what’s wrong with how you’re doing data architecture and identity management.

Decentralized Identity for PII Compliance

What if you as an organization didn’t have to collect correlatable PII? What if the most vulnerable, correlatable personal data didn’t need to be shared at all? Wouldn’t that be simpler and more cost effective?

The World Wide Web Consortium’s Decentralized identity (DID) standard makes it possible for a user to hold verified name, age, address, phone number and educational credentials, etc. encrypted in a wallet on her phone or other computing device. Rather than sharing these details outright and spending time inputting them again and again, the user shares their DID.

Here’s how the DID works, according to cybersecurity and self-sovereign identity provider Avast. Let’s say a bank needs to verify a customer’s home address, for example. The user holds an encrypted home address on her phone that’s digitally signed by the Post Office as a form of verification.

The bank can query user’s DID, and, with her permission via public/private key encryption, confirm the correct address. The address itself doesn’t move and isn’t copied. The bank’s system merely verifies the address via on-device matching. One-way hashed messages are the only moving data.

Apple, IBM, Microsoft, and other major enterprise IT suppliers have all expressed support for decentralized identity. But these entrenched tech sector incumbents have created a huge installed base of systems. And users have in many cases allowed themselves to be locked into one group of providers.

Enterprises, other organizations and consumers would do well to face up to the reality of systemic complexity and vulnerability. They should break the mold of old identity and data architecture strategy and embrace the new. Otherwise, compliance costs are going to continue to rise, and systems will grow ever more complex, making compliance less and less possible.