Practical Data Contracts: First-Hand Insights from 9 Data Experts
Part 1: Why Contracts, Parallels to Software, and Sync with Data Design Architectures
Data Contracts have been a widely discussed subject in the data community. Contrary to volatile trends and strategies, contracts have proven to be tangible, given it directly addresses recurrent concerns in data stacks across organisations and industries; also because of our ability to experiment directly with it instead of getting trapped in conceptual debates. In this issue, we address some practical concerns and solutions regarding data contracts.
Good news is we’ve curated an amazing load of insights from a very influential cast who’ve practically participated in the ideation and development of the Data Contract paradigm. Yes, we are talking founding engineers, authors, evangelists, and contributors of open specifications!
To make the experience more consumable, we’ve divided these insights across a three-part series:
Part 1: What and Why of Contracts
Part 2: Contract Ops
Part 3: Contract Ops (continued)
Our last release was a massive boost for many readers as they could access distilled information from across a wide range of experts and form their own opinion around the suggested solutions and ideas, and this release is in a similar vein. We encountered varying approaches to data contracts; while there's unanimous agreement for some strategies, there are also stark differences in opinions around some debatable topics, which could be a very interesting turn on how we approach this novel technology.
Contributors
Ananth Packkildurai - Editor at Data Engineering Weekly & creator of Schemata.
Andrea Gioia - Partner and CTO at Quantyca & Co-Founder at Blindata
Andrew Jones - Coined “Data Contracts”, Author of Driving Data Quality with Data Contracts, Principal Engineer at GoCardless.
Animesh Kumar - Co-Founder & CTO at Modern, Contributor to the Data Developer Platform infrastructure specification.
Chad Sanderson - Founder & Chief Operator of Data Quality Camp, Former Head of Product at Convoy.
Jean-Georges "jgp" Perrin - Founder of jgp.ai, author of Spark in Action (Manning), co-founder & president of the non-profit AIDA User Group, which hosts the Open Data Contract Standard, fka PayPal's data contract template.
Sarah Floris - Founder of Dutch Engineer, Senior Data & Platform Engineer at Zwift.
Shane Murray - Field CTO at Monte Carlo.
Shirshanka Das - Co-founder and CTO, Acryl Data. Founder DataHub project.
Substacks from the contributors:
So hold your seats and get ready to journey through the perspectives of some amazing minds in data! Learn first-hand how they approach data contracts from a practical viewpoint.
*Note: For higher clarity, feel free to double-click on the quotes for a zoomed-in view
To make contracts more relatable for engineers out there, could you draw an analogy to the software world for data contracts?
Where would you typically apply data contracts? Could you cite examples of use cases or scenarios?
What are your opinions and recommendations on enforcing semantics, quality, and/or security in addition to just schema through data contracts?
Should entity relationships be defined as part of the contract? While defining attributes of an entity, you may want to refer to other entities.
How will users define contracts to work with Salesforce, Facebook APIs which depend on multiple entities?
If you had to develop a minimum viable data contract for, say, a table, what would be the must-have attributes?
Could you give a practical example of how a data contract would enable a data product?
What's the most challenging aspect you see in the path to industry-wide adoption of contracts?
Have you experienced any similar tools or technology that could essentially serve the purpose of data contracts?
How do data contracts gel with popular design patterns like the data mesh?
We’ll wrap up this panel piece with the hope that it was helpful and illuminating. Watch this space for the next one, and feel free to drop your suggestions on topics you’d like to hear more about!
Interested to feature on the next DQ Community Panel Release? Drop a message on LinkedIn or reach out over mail.
A huge thank you to our panel of experts:
Ananth Packkildurai - Founder of DEW & Data Engineer at Mural.
Andrea Gioia - Partner and CTO at Quantyca & Co-Founder at Blindata
Andrew Jones - Coined “Data Contracts”, Author of Driving Data Quality with Data Contracts, Principal Engineer at GoCardless.
Animesh Kumar - Co-Founder & CTO at Modern, Contributor to the Data Developer Platform infrastructure specification.
Chad Sanderson - Founder & Chief Operator of Data Quality Camp, Former Head of Product at Convoy.
Jean-Georges Perrin - Co-Founder of oplo.io & Author of Spark in Action (Manning). Contributor to PayPal’s Open Source Contract.
Sarah Floris - Founder of Dutch Engineer, Senior Data & Platform Engineer at Zwift.
Shane Murray - Field CTO at Monte Carlo.
Shirshanka Das - Co-founder and CTO, Acryl Data. Founder DataHub project.
Available substacks of the contributors:
Authors Connect
Samadrita is an Advocate for the DataOS, a unified infrastructure implementation to enable disparate data design architectures such as meshes and fabrics. She works closely with data leaders in the community to channel valuable insights and build impactful relationships.
Chad is the Chief Operator of the Data Quality Camp, the fastest-growing data quality community on the internet. He is a leading voice in the data industry and is actively involved in evangelizing and influencing data contracts - the pivoting architectural pattern that declaratively enables data quality and reliability.
Looks like a great article but is is possible to put the words from the images into the actual body of the text - keeping having to open and close images makes this hard to read and digest. Thank you!