Warning: This is going to get heavy into Turtle code, but I think there’s enough here for it to be worth reading if you are involved in knowledge graph work.
I’ve been working with knowledge graphs a lot lately, and a conversation that I had with a few other ontologists has been resonating in my head. There are many bad ways to model something, but is there any single good way to model a business environment? I’ve known a number of very, very intelligent data modelers and information architects that have extremely well-developed ideas about modeling, yet at the same time it’s surprising how difficult it is to build a good model (in the sense that it is representative of the evolving state of an organization) that is also an easy-to-query model.
Modeling the Eternal Graph
What I (and others) are coming to realize is that the reason for this is that you actually need two graphs to model anything: the eternal graph and the now graph – and that these are optimized for two very different things.
The now graph describes the state of the world at this moment, with relationships that are descriptive and easily navigable. For instance, the now graph would state that Elizabeth II is the Queen of England. This is true and has been true for an astonishingly long time (since 1953). However, at some point in the not too distant future, Charles Windsor, will end up becoming King of England, if likely only for a few years and assuming he doesn’t abdicate before taking the throne. Prior to 1953, the King of England was George VI. The now graph of the time would have had an assertion in 1951 stating that George VI was King of England, while the now graph of 2035 will most likely have either Charles or William as the monarch of England.
Now, when this information changes, when Charles becomes King Charles (or whatever name he assumes), many people assume that curators go in and manually change these entries (such as those run by Wikipedia or Google). In some cases, this is exactly what happens, but in other situations, a different paradigm is used – an eternal graph creates a new now graph.
An eternal graph is a graph that is valid at any time, and in general this graph represents state changes over time. Eternal graphs frequently represent information in more generalized terms, and may actually get quite complex in structure. For instance, in the now graph that’s current for 6 May, 2022, the relationship between country and monarch may be expressed as a single triple:
Country:_UnitedKingdom Country:hasMonarch Person:_ElizabethWindsor.
Note here that both the predicate Country:hasMonarch
which is very specific to a given role and the association with a specific Person class and an instance for Elizabeth the second (Person:
_ElizabethWindsor). Now, more than likely the object of this assertion uses some kind of GUID designator to identify the entity, but the important point is that this particular approach makes ease of querying paramount.
However, suppose that you wanted your information to be immutable, or, put another way, you wanted your graph to have history. Being a monarch is in fact a role – it is something that a particular person performs at a given period of time. Expressing this can be more complex:
Role:_QueenElizabethII a Class:_Role;
Role:hasAffiliation Government:_UnitedKingdom;
Role:hasDesignation Designation:_Monarch;
Role:hasPerson Person:_ElizabethWindsor;
Role:hasStartDate "1952-02-06"^^xs:date;
Role:hasFormalStartDate "1953-06-02"^^xs:date;
Role:hasPredecessor Role:_KingGeorgeVI;
.
Role, here, is a considerably more generalized class. What’s more, Role is not in fact a subclass of person, but is its own distinct class, with the role’s temporal bounds (the start and end state of the role’s applicability) being very different from a person’s temporal bounds (when they were born and/or died).
Role is applicable in any number of different circumstances, and is more accurate in general than the simple assertion above, but it comes at a considerable cost in terms of querying. Nowhere do you see the statement that England has the monarch Queen Elizabeth. The models are different.
Which is correct? Both, when the salient fact hasn’t changed, but with the addition of a few more triples, only one is valid. In this particular case, let’s say that Queen Elizabeth decides to abdicate the throne in 2026 due to ill health, putting King Charles III on the throne.
Role:_QueenElizabethII Role:hasEndDate "2026-05-17"^^xs:date.
Role:_KingCharlesIII a Class:_Role;
Role:hasAffiliation Government:_UnitedKingdom;
Role:hasDesignation Designation:_Monarch;
Role:hasPerson Person:_CharlesWindsor;
Role:hasStartDate "2026-06-17"^^xs:date;
Role:hasPredecessor Role:_QueenElizabethII;
.
There are several points of significance here. No triples were deleted or modified. This graph is immutable, in that once a triple is written, it is inviolate. The end date acts as a marker for indicating that past this particular date this role is closed and no longer in effect. If, at some point after that, Queen Mother Elizabeth II decides she wants to take back the reins of power and Charles agrees, a new record, for the second reign of Queen Elizabeth, would be started.
Generating the Now Graph From the Eternal Graph
So, how do you get from the eternal graph to the now graph? This is a job for SPARQL Update. While the eternal graph is immutable, the now graph is not:
DELETE {
GRAPH Graph:_NowGraph {
?Country Country:hasMonarch ?OldMonarch.
}
}
INSERT {
GRAPH Graph:_NowGraph {
?Country Country:hasMonarch ?NewMonarch.
}
}
WHERE {
GRAPH Graph:_NowGraph {
?Country Country:hasMonarch ?OldMonarch.
}
GRAPH Graph:_EternalGraph {
?Role a Class:_Role.
?Role Role:hasPerson ?OldMonarch.
?Role Role:hasDesignation Designation:_Monarch;
?Role Role:hasCountry ?Country.
?Role Role:hasEndDate ?EndDate.
?Role2 a Class:_Role.
?Role2 Role:hasPerson ?NewMonarch.
?Role2 Role:hasPredecessor ?OldMonarch
?Role2 Role:hasStartDate ?StartDate.
filter(?StartDate > ?EndDate)
filter(not exists {?Role2 Role:hasEndDate ?EndDate})
}
}
This is a fairly complex query. In essence, the query deletes the old monarch in the now graph and inserts the new monarch in the same graph, yet most of the heavy-duty lifting is carried out in the eternal graph, using the end date of the old monarch’s reign as a signal to look for the new entry. Note that while this changes just one value (the country’s current monarch), it’s entirely likely that there may be several changes that are made in the now graph, but they end up reflecting the knowledge that the now graph contains after the update.
This can be generalized further if you recognize that a government class may be better than a country class here, and that a government is an organization that represents a country. In other words.
DELETE {
GRAPH Graph:_NowGraph {
?Organization Organization:hasLeader ?OldLeader.
}
}
INSERT {
GRAPH Graph:_NowGraph {
?Organization Organization:hasLeader ?OldLeader.
}
}
WHERE {
GRAPH Graph:_NowGraph {
?Organization Organization:hasLeader ?OldLeader.
}
GRAPH Graph:_EternalGraph {
?Role a Class:_Role.
?Role Role:hasPerson ?OldLeader.
?Role Role:hasDesignation ?RoleDesignation;
?Designation RoleDesignation:hasType RoleDesignationType:_Executive;
?Role Role:hasOrganization ?Organization.
?Role Role:hasEndDate ?EndDate.
?Role2 a Class:_Role.
?Role2 Role:hasPerson ?NewLeader.
?Role2 Role:hasPredecessor ?OldLeader
?Role2 Role:hasStartDate ?StartDate.
filter(?StartDate > ?EndDate)
filter(not exists {?Role2 Role:hasEndDate ?EndDate})
}
}
In this case, any leader entry will be updated automatically if an end date exists on the current leader and a new leader entry exists where the predecessor was the old leader. The only difference is that the role designation is classed as the executive role.
These updates are fairly comprehensive, and for large knowledge systems, these are only done at periodic intervals, such as once a day. The advantage of this approach is that the now graph represents an easily queryable environment, which significantly improves performance as well, while the eternal graph can accumulate changes in the background and maintain integrity and audibility. It can also be queried, which is especially useful when dealing with information changing over time, but it requires more significant SPARQL chops to do so.
Another advantage to the split duality of an eternal graph and a now graph is that you can perform calculations on the eternal graph then generate the results on the now graph. For instance, consider the following script:
DELETE {
GRAPH Graph:_NowGraph {
?Person Person:hasAge ?oldAge.
}
}
INSERT {
GRAPH Graph:_NowGraph {
?Person Person:hasAge ?newAge
}
}
WHERE {
GRAPH Graph:_NowGraph {
optional {
?Person Person:hasAge ?oldAge.
}
GRAPH Graph:_EternalGraph {
?Person a Class:_Person.
?Person Person:hasStartDate ?birthDate.
optional {
?Person Person:hasEndDate ?deathDate.
}
bind (if(?deathDate,xs:gYear(?deathDate - ?birthDate),xs:gYear(now() - ?birthDate) as ?newAge)
}
}
The calculation in this particular case is fairly trivial – it gives the current age in years of any given person if they are alive, and their age at death if not – but the ramifications are far from trivial. The now graph can serve as a running index for quantities that do change with time, again reducing the complexity of queries for end users.
RDF-Star, Reifications, and the Eternal Graph
RDF-Star provides another way of building a forever graph, one in which the now statements are annotated. RDF-Star provides syntactical sugar for reifications. Reifications were first introduced back in the early 2000s but not used extensively because of performance issues. By 2017, however, and the rising prominence of property graphs, a couple of new RDF-Star proposals were made, the most recent being the one covered here.
A reification is a statement about a statement. For instance, consider the three statements:
Country:_UnitedKingdom Country:hasMonarch Person:_EdwardWindsor.
Country:_UnitedKingdom Country:hasMonarch Person:_GeorgeWindsor.
Country:_UnitedKingdom Country:hasMonarch Person:_ElizabethWindsor.
These statements cannot all be true in a now graph, but they can be true in an eternal graph. One way of better describing these relationships so that you meaningful describe the now graph is to use reification to determine when the above statements are true:
<<Country:_UnitedKingdom Country:hasMonarch Person:_EdwardWindsor>>
Assertion:hasStartDate "1936-01-20"^^xs:date;
Assertion:hasEndDate "1936-12-10"^^xs:date
.
<<Country:_UnitedKingdom Country:hasMonarch Person:_GeorgeWindsor>>
Assertion:hasStartDate "1936-12-11"^^xs:date;
Assertion:hasEndDate "1952-02-06"^^xs:date
.
<<Country:_UnitedKingdom Country:hasMonarch Person:_ElizabethAlexandraMaryWindsor>>
Assertion:hasStartDate "1952-02-06"^^xs:date;
.
The syntax <<s p o>> indicates that the whole assertion should be treated as a single resource. In this case, the start date and end date then describe the assertion, specifically indicating when the base assertion is considered to be in force.
Sparql-star is an extension to SPARQL that makes it possible to query RDF-Star assertions using the new notation. To find out who the monarch of the United Kingdom was in 1950, you’d make the query:
select ?Monarch where {
bind (<<?Country Country:hasMonarch ?Monarch>> as ?Assertion)
?Assertion Assertion:hasStartDate ?StartDate.
optional {
?Assertion Assertion:hasEndDate ?EndDate.
}
filter(?StartDate <= ?TestDate)
filter(isBound(?EndDate) and ?EndDate > ?TestDate)
Values ?TestDate {"1950-01-01"}
}
This query makes use of an interesting trick – a reification is a blank node that identifies the subject, predicate, and object of a statement. This node can be stored and used with anything that generated that particular triple. This is a subtle, but important, point. The notation
<<Country:_UnitedKingdom Country:hasMonarch Person:_GeorgeWindsor>>
Assertion:hasStartDate "1936-12-11"^^xs:date;
Assertion:hasEndDate "1952-02-06"^^xs:date
.
Identifies THIS particular statement. If the statement had been made again, despite referring to the exact three resources, it would have been a different assertion, which is the same behavior that one would expect in a labeled property graph. Consider the following:
<Country:_UnitedKingdom Country:hasMonarch Person:_EdwardWindsor>>
Assertion:hasStartDate "1936-01-20"^^xs:date;
Assertion:hasEndDate "1936-12-10"^^xs:date;
Assertion:hasAuthor Writer:_JaneDoe;
.
<Country:_UnitedKingdom Country:hasMonarch Person:_EdwardWindsor>>
Assertion:hasStartDate "1936-01-21"^^xs:date;
Assertion:hasEndDate "1936-12-13"^^xs:date
Assertion:hasAuthor Writer:_MichaelJohnson;
.
In this case, you have two different writers who have made (contradictory) annotations concerning the same assertion. This is not an uncommon situation, especially when discussing historical percentages from far enough in the past that exact details cannot be authenticated.
This also points to a distinction between using a class-based forever graph and a reification-based forever graph. The class-based system asserts when a given instance comes into effect and goes out of effect, and is tied to the behavior of that class. A reification-based approach, on the other hand, applies only to a specific statement, and that serves to qualification assertion – when was the assertion true, who made the assertion, how strong is the assertion and so forth.
It so happens that you CAN take advantage of the syntactical notation of the reification to do things like determine the distance between two airports, but ONLY if all of the metadata properties happened to be bound initially to the same set of resources. If you have information coming from multiple sources, you’re likely to get some non-intuitive behavior if you’re not careful about that condition.
Finally, it should be noted that the reification approach can also be used with probabilities. For instance, Queen Elizabeth II is 96 years old. The probability that she will live another 15 years is about 1%. Charles is 73. The probability that he will live another 15 years is 47%. William is 39 years old, and the probability that he will live another 15 years is 91%. (based upon average UK actuarial tables). These assertions can be applied to the relative statements.
<<Country:_UnitedKingdom Country:hasMonarch Person:_ElizabethWindsor>>
Assertion:hasStartDate "1952-02-06"^^xs:date;
Assertion:has2037SurvivalLikelihoodOf "1.0"^^xs:percent;
.
<<Country:_UnitedKingdom Country:hasMonarch Person:_CharlesWindsor>>
Assertion:has2037SurvivalLikelihoodOf "49.0"^^xs:percent;
.
<<Country:_UnitedKingdom Country:hasMonarch Person:_WilliamWindsor>>
Assertion:has2037SurvivalLikelihoodOf "91.0"^^xs:percent;
.
These can be used to inform Bayesian calculations to determine the likelihood that the potential candidate in question will be the monarch in the given year (2037). This can be calculated as 1.0%, 48.5%, and 45.9% respectively (there’s about a 2.6% chance, just based upon life expectancy that it will be someone else, most likely Prince George of Cambridge, William’s eldest. The likelihood that Queen Elizabeth will be succeeded by another queen is vanishingly small, about 0.005%, as Princess Charlotte, who is seven years old and fifth in line for the throne, is only a little younger than George.
Summary
So what’s the best way to define an eternal graph while reducing the overhead associated with searching? As a general rule of thumb, first build out the now graph, then when the state of an item changes in that now graph changes, close out the reification on the existing state (by setting the end date) and creating a new assertion with new values and a reification indicating the new assertion starts being true at that point. The distance between two airports may not change very often, but the price of a trip from one airport to the next may very well change daily.
At the same time, treat the reification as an annotation – it may describe provenance, temporal scope, and potentially topical associations, but the property data should always be reducible to a now graph if a given date is specified. The interstitial reifications may refer to older reifications (this is useful for versioning, in fact) but should not usually refer to anything explicitly in the now graph beyond the statement being annotated.