Back in the 2000s, a mentor of mine often quoted then-CEO Scott McNeely of Sun Microsystems (now part of Oracle), who said, ”There is no privacy. Get over it.”
My mentor often said as well that many people would give away their data for a “free” T-shirt. We brainstormed about personal data security by imagining first that being on the open web is like being on a public street in the physical world.
It’s been quite easy ever since the advent of the web for bad actors and hundreds of thousands of organizations to take, keep, duplicate and share your personal, correlatable data. A whole industry (including gray and black market participants) has long ago sprung up around data aggregators, some of whom specialize in disambiguating identities and associating correlatable data with individual identities. Bingo–you’re exposed. This circumstance is what has made doxxing (sharing ill-gotten personal info with malicious intent) possible.
Two decades later, large language models (LLMs) and the enormous datasets being collected, curated and expanded for training those models are opening another view of personally identifiable activity. With enough data from enough sources, it can be easy enough to triangulate and determine who is who.
Data matters more
20 years or so ago, author Nick Carr famously said, “IT doesn’t matter.” Carr advocated a passive approach to IT, saying that IT was no longer a differentiator for businesses. After all, most companies licensed the same or similar software packages.
Today, the vast majority of organizations are still passive, although they know they need to be much more active. The reason they’re still passive? They’re assuming to this day that the same software providers they’ve worked with, or the point solution providers who promise something a bit different, are solving their problems for them.
The one caveat being that companies subscribe to the right software as a service or implement the right software packages, and companies earnestly try to optimize the use of this licensed software.
But what if the software doesn’t help the way you expect it to? Well, you can blame it on picking the wrong software. Or wait for functions to be added in future versions that promise to help more the way you need it to.
Companies are still often not thinking about the data. They’re thinking about applications.
Most established enterprises have had the same passive approach as described above for decades. So what’s different now? The focus has thankfully shifted from the applications to the data. The scramble now is for the right data to give us the right answers. Applications are secondary.
Because of the legacy preoccupation with applications, most organizations don’t realize that their (application-centric) architectures aren’t suited to managing data. They often don’t know where all their data is, or how many copies of a customer’s government ID number they’re storing.
Start by making it possible to control and maintain stewardship of that data to start with. Right now, most software providers want to control your data too.
By following the right principles and using the right type of data architecture, you will soon be able to change the means of control and shift the balance of control more in your favor. The right data architecture, in fact, defaults to being able to own and protect your data.
Innovative identity and data management methods that help
Decentralized and federated web development communities are aligning their efforts with user data protection needs. It’s well worth investigating what’s in the realm of the possible for individual users and exploring some of the most promising approaches. Here are a few thoughts on what seems most promising when it comes to these methods:
- Use on-device matching, decentralized IDs and credential verification messaging instead of sharing correlatable ID numbers. Doesn’t make sense to “protect” the most sensitive data by duplicating it over and over again. Instead, make sure the necessary confirmatory details can be verified by a verified third party (such as a government agency that issues passports) with the help of decentralized identifiers (DIDs), hashed authentication and authorization messaging, and a shared decentralized ledger. Long term, it doesn’t make sense to share mobile phone numbers or email addresses either, as those are of course correlatable ID numbers. Decentralized identity makes protecting personal identifiers themselves possible.
- Store data in your own secured SOLID repositories, and share access to certain data rather than allowing it to be copied or moved. Instead of a centralized repository someone else controls and manages, SOLID pods can serve as a federation of repositories in a supply chain, with each supplier controlling and managing access to its own data, for example. Apps are designed to use data from owner-controlled pods rather than centralized storage. See my post at https://www.datasciencecentral.com/building-a-hypergraph-based-semantic-knowledge-sharing/ for more information.
- Explore decentralized apps (dapps) built to use SOLID or peer-to-peer data networks such as the InterPlanetary File System (IPFS). Content addressable networks such as the IPFS allow each user to manage their own serverless graph, with fine-grained access controls and decentralized identifiers for users and data objects built in. Meanwhile, SOLID-oriented federated web development communities are emerging with their own flavor of less centralized apps. See my Brighttalk webinar The Rise of Decentralized Cloud Storage at https://www.brighttalk.com/webcast/499/565425 for more information.
Users, whether individuals or organizations, need to assert data ownership and control, particularly when it comes to their most sensitive personally identifiable data. They need to be proactive about their data. We’re well past the 2000s. Passivity is no longer an option.
This is a pivotal time we’re living in, and a great time to become more self-reliant.