Tim Berners-Lee has an interesting habit of coming up with ideas that seem hard to explain at the outset, remain all hard to understand even as they become more implemented and refined, can go for years with only a few die-hard fans becoming convinced that what he is doing is the best thing since sliced bread then seemingly overnight achieve ubiquity to the extent that nobody doubted that this was inevitable, even if they still have trouble explaining it.
Formally Sir Tim Berners-Lee (he was knighted by Queen Elizabeth in 1997, then became a Knight Commander in 2004) and frequently referred to by his initials TBL, Berners-Lee was the person who developed the first protocols used for the world-wide-web in the early 1990s, but he was also one of the people to apply the concept of linking pages to the broader idea of linking semantic ideas together, by establishing the Resource Description Framework (RDF) and the stack around which that was built. RDF took a long time to take off, in great part because most programmers in the early 2000s were simply not familiar with much of anything having to do with graphs, in part, because the ones that were familiar mostly came from academia, and finally because the state of technology at that point was simply not conducive to working with large scale graphs.
Two decades later, that has changed, and graph databases of various forms are now some of the hottest areas of technology investment in the field, especially as it’s becoming increasingly obvious that they complement machine learning and deep learning systems. However, one of the central challenges with graph databases has been the issue of federation – how do you get information out of multiple databases in a way that is secure, efficient, and easy to use.
Rethinking Data and the Database
A few years ago, Berners-Lee came back to the issue of Linked Data, and why it failed to take off to the extent that he and many others thought it would. The issues came down to a number of critical points:
- Ontological Complexity. One of the key ideas with Linked Data originally was that people would, over time, develop a common schema or protocol for communication, what’s often called an ontology. As it turned out, everyone had their own ideas about what such an ontology should look like, and while there has been significant consensus in some areas (most notably https://schema.org), it turns out that creating universal ontologies is a very difficult challenge.
- Complexity of Ownership. Most databases a the time (and even into the present day) are not set up to run individually – instead, they hold large relational tables, often unsecured, with information about potentially millions of people, primarily for the use of companies in their own marketing and promotional campaigns.
- Complexity of Management. One consequence of this is that the data that is kept has to be hosted, programmed and maintained, meaning that if you as a person (or even as a company) had information, you also had to set up a hosted cloud server, set up a database application, and then build interfaces into that database application. Most of this is well beyond the skills of the average person, and the budget to do so could become prohibitive quickly.
- Immature Standards. When Berners-Lee wrote about the Semantic Web in 2004 for Scientific American, the standard was still very much evolving, with some of the most powerful innovations at least a decade or more away.
If you had read the aforementioned article, one assumption that was made (if not fully called out) was the idea that data would be kept within personal databases or small business databases – the example of a young mother with two school-age kids using a service to contact the dentist to set up an appointment, all by modifying these localized databases, one belonging to the family, the other to the dentist. In Berners-Lee’s vision, such an approach would have solved a number of problems that have since come to plague the Internet:
- Containership of Private Information. At the moment, you have virtually no control over the information that exists about you. By creating private, keyed databases, people could store information that they needed (from credit card and medical information to the kids academic reports) and only make it available (for a limited period of time) to outside vendors on a conditional basis.
- Duplication and Errors In Information. By creating such a personal database, people with the appropriate key could access it to get the current state of some piece of information, rather than relying upon what is likely an older, outdated reference. This cuts down on the amount of duplication and the propagation of errors.
- Make Information Self Describing. Most people are not at all interested in the underlying structure or serialization of information. They want to be able to add, edit, and remove properties about things and have the database itself work out the details. Ideally, they even want their data “purse” to recommend the best way to organizate that data, without having to get into the messy bits of representation.
- Compartmentalize Responsibly. A writer working on a novel wants to keep track of who is doing what to whom where and when, but doesn’t want this data commingling with tax information except along the outer periphery. This means that it should be trivial to compartmentalize data, to put it into folders but have those folders able to share common information.
- And Federate Organically. Sometimes you do need to connect your novel data with your income tax data, especially when trying to figure out what were your most profitable characters and books. This means that it should be possible with data to create composite folders that join information together, permanently or temporarily.
- Make Information Queryable … People should be able to query information easily and with little need for specialized knowledge, but should also be able to get under the covers.
- And Updateable. Data should be be able to be updated transparently, either through overt user actions or due to external processes.
- Format Agnostic. I want to be able to drop a picture, a sound file, a video, a 3D object or any of potentially thousands of other formats on a box then have the file saved and all of the file metadata extracted transparently, including classification. On the flipside, I want to be able to get to the data in the format that most meets my needs.
- Easy and Inexpensive. Finally, in this future world, creating a new “pod” of information should be inexpensive enough that you can create, merge, split, and delete pods with few limitations, and should be easy enough that the average six year old could do it.
- Interconnected. When you put things into the data store, things should connect automatically and transparently.
This view of information is fairly profoundly different than what exists now in the world. It combines the notion of a database with a file repository and the core pieces of an operating system, and it really upends our entire relationships to data.
Understanding Solid
This, ultimately, was what Tim Berners-Lee wanted to create back in 2004, but it would take two decades to bring that vision to reality. In 2015, received a grant from Mastercard, Oxford University, and the Qatar Computing Research Institute to help jumpstart the project, founding a company called Inrupt in the process. By 2018, work had begun on a formal specification for the new project, now called Solid, as a freely available standard through the Worldwide World Consortium (W3C). By December 2021, Inrupt had raised an additional $30 million dollars under Series E round of funding.
What makes Solid so novel is the fact that, while Inrupt itself is a commercial company established to prove out the ideas of Solid, the specification itself is completely open. What this means, in theory, is that any company (or individual) can create their own Solid implementations, using the standard to ensure a protocol for integration. This is in many ways a similar journey that Berners-Lee took thirty years ago, creating both a protocol for communication (at the time, HTTP) and protocols for server and client behaviors.
Where things differ is that what underlies this whole process is RDF. This is not immediately obvious, primarily because the RDF is essentially acting as a language to abstract other types of data (this is an approach that is also used by data.world and an increasing number of other service providers who use RDF as a lingua franca for relational and document-centric (XML and JSON content), and increasingly are using it decompose Word, Excel and related office documents as well. This means that out of the gate many of these same documents are effectively pre-indexed when they are uploaded.
In this model, a Solid Server is a host (usually cloud-based) that can serve up multiple Solid Pods. A Pod is, in effect, an individual database, typically associated with a given account or profile. The POD has a global WebID, with the underlying idea being that such a WebID is a token that identifies an entity (a person, organization, or agent) for federated authentication (such as that used by OAuth2 and similar single-sign-in standards). WebIDs have been around for a while, but they too are evolving, and will likely start being replaced formally by Decentralized Identifiers (DiDs) and Verifiable Credentials, both also W3C standards.
Because these standards are being developed in parallel (and are intended to subsume most of the functions of blockchains at a more generalized level) SOLID pod verification will likely continue to track these standards for several years. In many respects, this emerging stack will likely be the successor to HTTPS and may facilitate credentialing in ways that both provide better metadata support within authentication channels and mitigate against the Proof of Work aspects that have made Blockchain so problematic.
It should be noted that, because SOLID pods can effectively emulate the server/file path models that have been around since the earliest days of the web, you could actually use SOLID to create secure websites with little to no deviation from today’s practices. However, in practice, you could also take advantage of such pods to store graphs of information, providing a level of metadata control that is far beyond what is available today.
Solid Pods in turn are made up of containers, which are referential analogs to folders. In effect, part of the Solid API involves creating a graph that represents a linked container system, with containers in turn also having global, verifiable credentials. This means that it is possible to assign access controls to both data and resources with the various Pod containers. Because this “file system” can be serialized as Turtle, it also opens up the potential for sending a “manifest” between systems without necessarily sending the actual data (which can aid significantly in discovery).
One of the more intriguing aspects of such Pods is that they simplify the process of federation. An organization can have a central pod that in turn provides a data catalog into subordinate pods, and you can access the public (and specific private) information within any given pods in the aggregate as part of a simple HTTP SEARCH command. This data can then be cached and aggregated temporarily in a pseudopod (*groan*) to act as a virtualized pod repository that can then be deleted when no longer required.
Finally, it’s worth noting that there is a very clear delineation between a pod server and a pod client, which can more properly be considered pod applications, not all of which are user-facing.
Pod Potentials
So far, this vision may seem like just another way to build a glorified file system, but it’s worth exploring some use cases, not a few of which are already in active development.
Pod Purses
This is the Ur case, but even here there are a number of potential applications. A pod purse can be thought of as a navigable name/value store and can act as a secure online wallet, a place for truly managing bookmarks, a personal digital asset management system, a post scheduler, a portable mindmap, a virtual key ring, a health and medical record store and so forth. It allows you to store files and associate metadata with those files, and can even go a long way to classifying content. Note that such a pod could be online, but could just as readily be associated with a mobile device (possibly with a server pod synced for backup).
Content Manage Systems and Publication Portals
Pods lend themselves naturally to web pages but with a twist – because Pods are able to communicate with other pods (and because that communication is far more secure), a Pod is essentially able to tap into other sources dynamically and can consequently build or cache complex Content Management Systems that can fit on a phone, a watch of similar devices. With empowered goggles or glasses, this makes life-streaming (the creation of either a true or fabricated data stream around a person) far more feasible. Such PODs can also be tied into caching pods on solid servers, potentially with real-time classification via machine-learning pipelines being added to the metadata stream between systems.
Sensor Pods
It’s not just people that can have pods. Sensors can utilize local or aggregator pods to create a nearly real-time view of a sensor field that can then be opened and queried with the right permissions, reducing the need for complex protocols on the sensors themselves. Because PODS can (theoretically) be set up to validate through something like SHACL, this also opens up another scenario that is perhaps more intriguing – a safe way to control actuators, in essence by testing to make sure that any configuration changes made by the actuator commands (as declarative triples) can in fact be imposed on sensor/actuator pairs (such as controllable CCTVs). This has huge implications for everything from smart cities to drones to environmental systems.
Digital Twins and System Coordination
This can be carried over to the idea of digital twins, especially when dealing with twins that are part of interrelated systems. In effect, different parts of a system can each have their own Pods, reflecting the current state, while another Pod may actually then be a system simulator that can work with the existing component Pod datasets to be able to read, test, and control the aggregate system. In this respect then, you’re beginning to deal with systems of pods, which is where the real potential for these kinds of systems begins to emerge. At a minimum, the aggregator pods can also serve to simplify the overall state semantics through an ingestion pipeline (an aspect of Pods that I expect to see evolve in tandem with the specification).
Financial Transactions and Smart Contracts
One area where I feel Pods will have a revolutionary impact is in making distributed ledgers truly workable. Depending upon the implementation, Pods can be set up to be immutable – once written, a set of assertions will remain inviolate, and because they have the possibility of creating credentialed DiDs, such assertions can do everything that blockchain can currently do – including the (rather problematic) proof of work layer. Moreover, an immutable pod has far more room to encode semantics than blockchain does.
This also goes a long way towards making smart contracts feasible. While things called smart contracts have been around for a while, most of these tend to be a thin layer of blockchain tied into bundles of legalese. A true smart contract is essentially a semantic agreement that specifies the parties, the resources, constraints, actions, schedules, validations, and compliance rules, making a pod an ideal vehicle for the enforcement and entailment of such contracts. What’s more, such contracts are more readily litigable, because such contracts can in fact be tied into specific legal code.
Supply Chain Management
After Covid, people have become far more sensitive to the vagaries of supply chain problems, and it is here that Pods really have the potential to shine. Put a pod on a Pi or even encoded onto an RFID chip, bind that to a shipping crate or even an individual product. Put another pod on a large shipping container, and put pods on ships, trucks, and aircraft. Because you have a single protocol for communication, you can then create a dynamic view of every product everywhere in your system globally, without the need for time-consuming scanners. Since a graph is declarative, it works great in read-only environments. Often the biggest problem with supply chain management is knowing where something critical is so that you can reroute a ship to a different (less heavily burdened) port.
Electronic Medical Records
Medical records are complex things, in part because there are so many interactions with so many people over time. A complex of pods could actually hold disparate parts, with access then granted to various parties from each of them. A doctor could keep her personal notes on a patient tied to a given patient EHR number, but expose notes to the patient, clinicians could add notes concerning tests but not necessarily get access to personal information about the patient, and so forth. No one has all of the information, but everyone has the information that they need.
Metaverse/AR/VR
A similar approach can be taken with a metaverse or shared world environment. People interact within a scene graph, but the scene graph doesn’t actually exist on a single server. Instead, a scene graph has access to several different pods, some for avatars (actors), some for models, some for sensor fields or similar structures, some for IoT devices, with a pseudopod then being formed that make it possible for people to interact with one another and within the scene. This also has the advantage of letting users maintain their own histories over time rather than those being stored within a single large contextual graph. As with the digital twin, one of those scene graph pods would act as an orchestrator for the activities within the scene.
It’s also worth noting another scenario. In a metaverse, props (such as a magical sword, or gold coins) move from one pod to another depending upon the interaction: The thief Philia steals the Sword of Schlepper the Sleepy from the Dark Dank Dungeon scene, in effect removing the data of the sword from the DDD scene’s pod and adding it to the thief’s pod. If others see Philia, they will also see the Sword of Schlepper if it’s visible, but they can’t look up the sword unless they specifically query Philia and she gives them permission to see the details. By explicitly tying pods with transactions, you end up with a transactional model that reduces the potential for duplicating artifacts. Of course, it also means that each Gilder (gold piece) in the DDD environment has its own identifier.
How Solid Is Solid?
The Solid specification is fairly wide-ranging, and there are several sections that are still undergoing revision. However, the specification is reaching a point where commercial alphas are giving way to commercial betas, and I have personally seen a number of Solid pod projects already being used in areas as diverse as property management and EHRs, so it is likely that the drumbeat surrounding the Solid specification and applications therein should start sounding by Summer 2022. It is likely that by Winter 2023, a number of major players in this space will be announcing their own solid implementations, especially given how central it could be to any number of projects and products.