The goal of a data quality program is to build trust in data. However, trust is an expansive, and often ill-defined term that can include many topics that control and manage data. Trusted data is possible when all the components of the metadata management platform work as a single unit. For example, without accurate data, it is very difficult to ensure that all the data security and privacy programs will work as envisaged. This should be a primary goal of chief data officers (CDOs).
But so many organizations have failed to deliver on multiple data governance attempts, that this term is now banned. However, the reality is that global compliances are only increasing and irrespective of what we call the data governance program; it is imperative that business quality be addressed.
The benefits of the modern data quality approach are:
1. Accountability: In the decentralized data delivery world of data mesh and data products, the modern approach allows business teams to take charge of data quality. After all, the domain owners are the subject matter experts and know their data the best.
Business users augment the technical aspects of data quality by addressing context to meet critical KPIs. Data quality then becomes a committed SLA in the packaged data products. And it is constantly evolving as the data changes. Hence, data products have new versions. The data consumer no longer has to second-guess whether to trust data or not.
2. Speed of delivery: ‘Data quality latency’ is the time between the arrival of new data and performing data quality checks and remediation on it.
More data is now generated across multiple external data sources, such as SaaS products in multiple formats, and often arriving in real-time streaming than in internal systems. Past techniques of landing the data in a single target location and performing data quality as a batch operation are no longer sufficient. The old static approach treated data quality as a standalone effort on data at rest that ran only at fixed intervals.
The modern ‘continuous quality’ approach is proactive and dynamic. It is in sync with the DataOps principles that include orchestration, automation, and CI / CD. This approach allows data teams to deliver data products faster. It permits organizations that were used to doing one release per quarter to accelerate and deliver many releases a week.
3. Higher Productivity: One reason why traditional approaches to data quality are unsuccessful is because of the enormous amount of effort and time that is needed to achieve the ultimate goal. Precious staff are bogged down in manually fixing data quality problems in downstream systems. Often, the time-consuming reconciliation takes place in a Microsoft Excel spreadsheet. This is treating the symptoms and not the problem.
The modern approach of identifying and remediating the problems close to their origin saves time and cost. Through various automation capabilities offered by DataOps and through integration with the other aspects of data governance, this approach leads to higher productivity of the data teams.
4. Cost: As data volumes keep increasing, to do continuous quality, the system needs to scale automatically. This is typically where cloud-based solutions help. However, even in the cloud, there are two ways to run data quality checks – one is via an agent that constantly monitors data-in-motion, and the other option is to push down data-at-rest in the cloud data warehouse and use pushdown features. Each option serves unique use cases and provides architecture and cost trade-offs.
In the former approach, data quality issues are detected before the data lands in a target analytical system. This is useful for anomaly detection in the case of streaming data. However, it will require a processing engine, such as an Apache Spark cluster.
In the latter case, data first lands into an analytical system, such as Snowflake, and then the data quality product generates SQL queries to perform right inside the storage engine. This option minimizes data movement and hence, may be more secure. Also, it can take advantage of the auto-scale features of the analytical system.
Architects should analyze the total costs of each option to assess the appropriate architecture.
Get started today with DQLabs and explore our Modern Data Quality Platform!