2023-02-7 DMRWG Meeting Notes
Meeting Date
The DMRWG meets bi-weekly on Tuesdays at 12:00-13:00 PT / 16:00-17:00 UTC. Check the ToIP Calendar for meeting dates.
Zoom Meeting Link / Recording
- Recording - meeting starts at 9:00, discussion prior to that is captured in the notes (below)
(This link will be replaced with a link to the recording of the meeting as soon as it is available) - Slides Authentic Provenance Chains for Verifiable Data Registries
Attendees
Main Goal of this Meeting
Authentic Provenance Chains - Functionality and Data/Structure requirements for Verifiable Data Registries.
Agenda Items and Notes (including all relevant links)
Time | Agenda Item | Lead | Notes |
5 min |
| Chairs |
|
5 mins | Review of action items from previous meeting | Chairs | |
5 mins | Announcements | TF Leads | News or events of interest to Data Modelling & Representations WG members:
|
20 mins | Authentic Provenance Chains | Chairs | A key requirement for Verifiable Data Registries is an audit trail of all the processing and all the people (producers and governance stewards) who touched the data from the point of data creation/capture through a published dataset. This includes the following requirements:
Today's discussion is to look at a model of how that could be implemented in exploring the functionality and data (and data structures) required for Verifiable Trust Registries and their listed trusted entities to be trustable in a robust and secure manner. The presentation today is concerned with Authentic Provenance Chains - a "Trust chain" (which ACDC supports) using cryptographic signing of data (and all objects are data), counter-signed by data processing authority (the representative of the organization/team that processed the data) and the governance authority (the organization representative responsible for ensuring governance compliance), which is also chained to previous data processing steps and support artifacts (processing scripts applied to the data and testing) We will look at Verifiable Data Registries - which are registries of objects (which are data), which have been verified and validated by a governed process which performs "due diligence" on the objects and their data and processing lineage - be it with sw/hw, manual processing or some combination. And Trust Registries are a sub-type - a Trusted SSI Component/Rol Registry, of which the subjects of the first use case are Issuers and Verifiers The underlying principle for trust chains is the principle of "Authentic Data", where the creator/publisher/owner of the data uses a private key under their control to sign a hash of the data (a crypto-hash), publishing the corresponding public key, which can be used by any consumer of the data, by verifying the crypto-hash of the data (with the public key) There may be a distinction between a Verifiable Data Registry (Authentic Data Registry?) and a Repository. An V/ADR provides proof of the data's lineage, a Repository may be the database which provides access services for the data (and the two may be combined_. Data can go through a series of processing from collection/input to publishing:
A data trust chain will link the published data set to the raw dataset and to the personnel or devices that collected the data. Note: metadata associated with the dataset (e.g. OCA model), plus software licensing and other data administrative artifacts should also be included/linked. What is different about the Chain Element (in Authentic Data Provenance Chain - ADPC) vs ACDC is
The principle here is that Data and supporting documents are not moved to the ADPC but are referenced - where the references are crypto-secured. This allows the same data and artifacts to be included in multiple ADPCs and partitions the chains from the storage management of the referenced data. Question: - where will ADPCs (the links) be stored, and what structure/model? Structure & other requirements include:
Where ADPCs are stored or managed will be up to the Ecosystem. Persistence of this type of audit trail can take a page from the capture of aircraft and trail system black box logging or any operational system where accident/incident analysis requires records to be kept for potentially years (in some form, not always full fidelity). The issue of private/sensitive data that may be on the chain was discussed. The onus on proving that a provenance chain exists is something that will be publically required, but that can be a form of zero-knowlege-proof (not necessarily cryptographic). An example of this would be an Issuer (e.g., GLEIF) who must perform due diligence (Know Your Customer) on a person or organization. In order to issue a verifiable credential, idealy, the issuers should have a record of all the due diligence information captured and referenced in ADPC links, but that information is available only for an authorized external audit. Otherwise, it is sensitive and private data. Assumptions:
Observation - the most important part of the ADPC are the signatures and their provenance - back to the people authorized to sign. APIs - some requirements
More on walking to the root of trust. What are the trust checkpoints in interacting with any entity in an SSI? Some observations on trust checkpoints for the other party in a two-party exchange:
Unknown:
Academic reputation (of data & data sets is based on)
Observation on a verifiable credential application for resumes, which consist of "signed" affidavits of an academic degree and of time, skills and job description for employment.
This is a starting point, which could be extended from the paper evidence to crypto-signed replacements, which get deeper into the backing information over time (e.g., from a degree to marks on each course, plus links to the teacher/professor of the course, etc. Looking at these different trace paths - they are a labelled graph, a top verified object - the academic degree which follows to multiple roots. Observation - any SSI object (entity, dataset, VC, etc.) should have an API to allow any object to provide proof of its own history and provenance on data and trust checkpoint criteria Assumption: there is a partition between data about an SSI object in a Trust List/Registry or in a ADPC chain vs. data access through the object. There may be information about the history and provenance of medical data records, but the actual medical data is not accessible by the Trust List/Registry API, only it's proof chain and data hash. The hash of the actual data would be delegated to the SSI object, preserving privacy control with the data and data controlling component. Summary (by one of the meeting members) So the components for this kind of system - someplace (many places) where the data is stored. - a provenance chain that is either stored on a distributed ledger OR stored locally but occasionally hashed onto a distributed ledger - some kind of VDR where the keys that were used to sign chain elements can be verified. - some kind of governance of the ecosystem (components of this ecosystem) which establishes the necessary standards to ensure trust in the provenance. Observations:
|
15 mins | SSI Risks | A core principle of Governance is to prioritize based on risk for an application, service or data source, the organization, its personnel and its financial/legal state/reputation. The current state of cybersecurity points to a number of key risks to which SSI is (also) vulnerable:
Some examples pertinent to SSI:
For example, if you are a root administrator on a server, and a service relies on the server's cryptographic libraries, the admin can swap out the libraries for a compromised set. "Authentic" methods, e.g., crypto-signing an executable and related support files (the libraries), controlled by the actual administrator (or other authority), can provide tamper resistance as the service can verify the library by using the public key of the signing authority by rehashing the executable's signature and verifying w the public key. Another example: "If you are not the person who is building the compiler, anyone can add a back door and you would never know". | |
5 mins | Any other business | ||
5 mins |
| Chairs | Plan for the next meeting (Feb 21, 2023 - Sam Smith will present on the experience for authentic provenance chains with the GLEIF project using ACDC. |
Screenshots/Diagrams (numbered for reference in notes above)
#1
Decisions
- Sample Decision Item
Action Items
- Sample Action Item