In the last decade, the number of networks, devices, apps and cloud components a typical IT department needs to handle, has exploded in complexity. As has the list of vendor tools that are available for monitoring how effectively the different parts of an entire IT infrastructure might be performing. But the correlation of datasets from multiple systems isn’t always straight-forwards.
What exactly is dataset correlation?
The general term ‘data correlation’ can have multiple meanings. It can be used to describe anything from how well the latest COVID vaccine dosage quantities are working on patients, through to the impact interest rates have on house prices. As a ubiquitous term, its use can sometimes be miss-leading. But ‘data correlation’ usually refers to the testing of relationships between quantitative variables or categorical variables. In other words, it’s a measure of how things are related. In this post we’re referring to correlations between IT datasets.
What are the typical dataset correlation issues affecting IT Security & Monitoring?
A recent report by Dell stated that ‘most advanced Cybersecurity attacks will go unnoticed for an average of 197 days’*. This is usually because the data correlation signals are simply hidden in plain sight.
Other times the warning signal might be triggered in one system, for example an endpoint or malware issue related to a network or device ID. But without correlated data, it might not be immediately obvious which users or groups could be impacted. So, the mean-time-to-resolution (MTTR) can be held back.
Sometimes you need to cross-correlate IT Security & Monitoring data with other organisation data. For example, if you’re running an online business and your server goes down, due to an issue with hitting max CPU during the busy Black-Friday weekend. A quick correlation with the corresponding ‘cost-of-being-offline’ data can aid justifications to senior management your department’s requirements, for example by boost server bandwidth during the times the server is likely to max out again, thus saving your organisation in potential lost revenues.
What makes IT dataset correlation so difficult?
Most attempts to correlate data across different platforms means IT Managers manually login to these different systems simultaneously, before they can even begin to make sense of the data.
Even when they can access the relevant data, often the platforms will show the data in different unit/metric formats or date/time-ranges, and the interpretation of these metrics could vary from one IT manager to another, and their export formats will probably differ. Which means errors get introduced and there is seldom an agreed ‘single-source-of-truth’. Which could cause disagreements within the team, or even worse, the important warning signal could be lost in the noise.
A highly skilled Security / Monitoring / IT Manager will usually have great ‘domain’ knowledge for their datasets they’re used to working with. But they aren’t necessarily also a Software Engineer or a Data-Architect / Data-Scientist, which means they won’t be in their comfort zone accessing data via the APIs and they could also quickly become stuck wrestling with and wrangling through huge datasets.
What’s more, once you start moving data around there are often Data-Protection / GDPR / Data-governance policies that must be adhered to. Which can present new challenges to the unfamiliar.
What solutions are there for improving the data-correlation between the various components of your IT’s infrastructure?
There are different ways you can improve your IT infrastructure’s overall dataset correlation options. With each having different cost, timescale, performance & robustness implications. Here are a few different approaches:
1. Self-build in-house
Many companies will attempt to address data correlation issues by building solutions in-house. This can work well for building something bespoke, but be prepared for it to turn into a very large project as there can be several moving parts. Since you’ll require software engineers (to connect to the respective APIs), Data-Engineers (to correctly model the data flows and data-pipelines), Data-Governance folk (to ensure you’re adhering to GDPR & other data-policies), Data-Scientists / Analysts & Data-Visualisation experts (to convert the modelled data into easily understandable insights). And as this whole piece can easily require a whole team effort, then they’ll need a Project / Owner / Manager too.
2. Buying in (yet more) software tools
Another typical approach to resolve data-correlation issues is by buying-in off-the-shelf tools, with a view to then stitching the output from each together. For example, you might purchase Alteryx for the ELT and Tableau, Sisense or Looker for your data visualisation. Whilst these tools are good in their own rights, they aren’t going to connect to all your data-sources and nor will they address all the data-flows or data-automation for you. Also, systems built out of lots of bolt-on component parts, can easily fall apart, they are only as good as their weakest link which means they can run the risk of becoming like the recent U.K.’s track and trace project, in which excel was a very poor system choice and critical life-and-death data was simply lost.
3. Correlated Data. Built exactly to your specification
An entirely different approach for data-correlation is to have all your API connectors, data modelling and data-visualisations, dashboards, portal-access and so on… fully built for you. The advantages with this approach is that you’ll be using a team that already does this day-in-day-out, and your work will sit atop a proven code-base. Which means you get the highest-quality build available and you’ll be well set up for scale and continuous improvement.
This ‘built-for-you’ approach also means a third-party is then responsible for the data-connections, data-modelling and keeping all those live connections running, using the best-in-class data visualisations for communicating insights, so the headache is off your plate. A great option to use for this would be a platform like Stratiam. The team behind Stratiam already have a growing library of API connectors, ready to plugin to your data sources and get you setup quickly. Custom APIs can also be created, for any API, and their entire system has been built to be fully compliant of GDPR and other policies and is thoroughly pen-tested on a regular basis.
Whichever approach you take for improving how your IT datasets can cross-correlate will add massive value to your overall IT infrastructure and will bolster up your overall IT Security & Monitoring capabilities. If you don’t currently have plans in place to address this yet, then they should be added to your roadmap for the coming year.