I once posted about making use of narrative objects. In this blog, I will be discussing an algorithm that supports the creation of these objects. I call it my “Infereferencing Algorithm”: this term is most easily pronounced with a slight pause between “infer” and “referencing.” I consider this a useful and widely applicable algorithm although I don’t believe it operates well in a relational database environment. Instead, I use “mass data files”: these contain unstructured lumps of symbols or tags.
Infereferencing is a process of extracting aspects of a larger story (defined by the story itself) using much smaller pieces (defined by my specific needs). Perhaps the most straightforward analogy is in how people do online searches: they present a string of text to the search engine (defined by their own needs); and the engine will create a listing of links that seem applicable – although the underlying articles follow their own sphere of discourse. Keywords might already exist for the resources; or the search engine could pre-compile a list of applicable terms for faster access in the future. But these indexes should not confuse the fundamental object differences separating the submission from the outcomes. I will elaborate.
Stepping back a bit to consider the general dynamics, it should be apparent that the person seeking information suffers from its absence. This is not merely an absence of “content” but also “context.” The submission to a search engine releases not merely content – in fact, content is presented in a framework of context. Content and context are so frequently conflated, it almost seems like there is nothing to differentiate. This is why it is not immediately clear that the user is submitting an externally defined context in order to obtain objects having their own internally conceived contexts.
In the process of infereferencing, the objective is to canvas the internal context of the data, which is not necessarily defined by the user. Rather, there is merely a superficial intersection – between the submission and parallel narrative object. If we consider a database of crime scenes, for example, the user does not set the context in which data is collected. He or she sets the ontological boundaries of the submission, which serves a purely instrumental purpose towards the user’s goals and objectives. Those gathering data from a homicide investigation aren’t necessarily thinking about the user – at least not on a personal level – and possibly not at all. For their instrumental needs relate to forensics, record-keeping, policies, and practices.
In the example provided here, I am interested in stories involving “lurking, waiting, and hiding.” Please take a moment to consider the significance and meaning of these terms, which might be a matter of preconception and social-construction. I am saying that the person seeking data might not be thinking the same way as the individuals who obtained the data; and the specific terms might therefore not carry the same meaning. This naturally-occurring fog is typically minimized or eliminated in controlled experiments; but in relation to routine field data it is necessary to take into account the complexities posed by ontology; this is the cost of making use of non-experimental data.
The names of the “scents” are a bit arbitrary; but to help readers I am calling the data scents Lurking.txt, Waiting.txt, and Hiding.txt. I will only be elaborating on the first scent – Lurking.txt. The utility handling the algorithm generates a gradient for the “scent file”; but it does not create the “test file” or the “mass data files.” The person collecting the data is responsible for (his or her contribution to) the mass data file. Test files might be developed by any person to help users make sense of the mass data. (I don’t expect many humans to read mass data.) Below, the trace file “Lurking.txt” triggers a gradient analysis of the test file lurking.txt in the test directory called “Observers.”
The configuration of the scent file can vary greatly since it doesn’t necessarily have to use a gradient from a single test file. In a traditional scent scenario, there might be many dozens of test files – apart from lurking.txt containing <lurking> and <luring>. A “scent” after all is not necessarily something meant to be traced. The scent of chocolate makes me think of my father who worked as a mechanic at a chocolate factory; this makes me think simpler times free of responsibility; and for some reason this makes me think of wandering in a bookstore. Anyways, the idea is that a “scent” is multifarious. It is possible to smell danger, comfort, and one’s childhood. Infereferencing however is a highly focused application of the exploitation of data odours.
The test file indicates that the symbol for lurking is actually associated with the tags <lurking> and <luring>. The analyst should give this matter some thought – the extent to which the association conforms to his or her needs. For example, although any target might be lured, I personally would expect it to be more relevant in relation to children. This isn’t really a discussion on dictionary definitions but on social context and significance. It is a bit like playing Family Feud. “Of a 1,000 people surveyed, what is the most frequently eaten food served with eggnog? Also, what does lurking mean to you?”
A list is compiled from the mass data files that have been scanned for the test elements. Each file on the list contains <lurking> and <luring>. Now the interesting part is compiling a list of tags from those mass data files and counting the number of events – a process that I call “turning.” When a person is bitten by a werewolf, there is a change he or she might “turn” a bit later, right. The image below shows that systematic testing gives rise to a “turn-table”; when lurking (the scent) is part of the storyline, luring (the event) is actually more prevalent than lurking (the event). Also important are the tags for forced confinement, missing person, and vocal arrest.
What do I mean by “vocal arrest?” I mean that the person is prevented from calling for help. The victim is impaired in his or her ability to vocalize. Guess what the meaning of “digital extraction” is. (I provide this example just to make a point.) Digital extraction means having one’s fingers chopped off – or in any event severed from the hand. It has nothing to do with digital imaging. Consequently, test files and the symbols they contain represent the “keys to the kingdom” – helping to explain the intended meaning of tags in the mass data files.
This is it as far as basic infereferencing goes. If a person wants to know the narrative events involving lurking in subway stations, it is necessary to add subway stations to the test file. Or even better, it would be worthwhile including tags pointing to enclosed public areas underground. I would describe the tags shown on this post as “vernacular” since they do not pertain to any formal models. This is because I deleted the rows that pertain to models. (I didn’t want to confuse the issue.)
A storyline might involve victims and perpetrators, settings, circumstances, and social dynamics that are part of a coherent model. Lurking could be associated with particular character types – at least in relation to the narrative resources that I use. How about terrorists, serial killers, and mass murderers? I would say that the more specialized the nature of the parallel narrative objects, the greater the need to make use of models.
Infereferencing is meant for situations that tend to be non-repetitive – where the case is characterized by the absence of data – making it necessary to “infer” using external “references” (infer-referencing). I also call this algorithm the “Half Crosswave Differential Algorithm.” The (full) Crosswave Differential Algorithm is meant for repetitive events such as maybe treatments, production, and transit control. Users interested in this other algorithm should search Data Science Central for my articles. I have already posted a number of blogs covering it.
Potential Applications
It would be interesting to submit <high_sales> <isle_6> and read the sort of events that seem connected. For example, mopping the floor during certain times of the day might influence the sale of merchandise from isle 6. The amount of lighting might affect the sales along with the restocking schedule. I recently wrote about a scenario where some agents might have more success selling to women than to men – depending on their character traits. Clearly, there are many useful business applications.
In the event a [trait] is associated with <high_sales>, what logically follows in the analysis? Well, myself, I would check the extent to which [trait] is associated with <low_sales> and <average_sales>. Then what? I would check the distribution of sales on the absence of the [trait]. If there is little data, infereferencing can provide worthwhile general guidance. Or, if quite a lot of data is available, it is possible to invoke the Crosswave Differential.
Frequently when a large amount of quantitative data is collected, the circumstantial details surrounding the data are lost. An analyst might be asked, “Why was there such a big decline in sale on this day?” The statistician will say and can say “almost” nothing at least from a statistical standpoint; and it is important to recognize this fundamental deficiency. Quantitative data contains essentially no narrative – that is to say, no context surrounding its pathology, morphology, or development. Infereferencing provides what I consider a simple and straightforward foundation to obtain guidance from the context of data. The algorithm is so practical and easy to use that I can’t help but invite readers to incorporate it into their business systems.