2020-11-23 CTWG Meeting Notes
Meeting Date
23 November 2020
Attendees
@David Luchuk
@Drummond Reed
@Rieks Joosten
@Daniel Hardman
@Dan Gisolfi
@Scott Perry
@Steven Milstein
Main Goal of this Meeting:
Understand (and if possible decide on) @Daniel Hardman's proposal for tooling and overall workflow.
Agenda
Time | Item | Lead | Notes |
1 min | Welcome & Antitrust Policy Notice | Chairs | |
2 mins | Introduction of new members | Chairs | |
2 min | Agenda review & open Action Items | Chairs | |
5 mins | Co-Chair volunteers | Chairs | |
35 mins | Presentation and discussion on tooling and workflow | @Daniel Hardman | |
12 mins | Integration with Operations Team | @David Luchuk | |
2 mins | Review of Decisions and new Action Items | Chairs | |
1 min | Next meeting | Chairs |
Recording
Presentation(s)
Documents
File 1 - link
File 2 - link
File 3 - link
Notes
Welcome and Linux Foundation antitrust policy
Introduction of new members
Agenda review & open Action Items
@Daniel Hardman presented his slides and recommendations about terminology tooling
His evaluation of existing tooling (open source and commercial) is that nothing that is still maintained really fits our needs well
We need "just enough tooling"
Daniel proposes the following data pipeline
Capture—receive raw data from on-off ticket or batch submission PR or script
Scan—human sanity check to triage, catch basic issues
Merge—commit to repo, convert to internal data model, assign permalinks, becomes publishable
Mature—Run (semi-)automated QA. Generate tickets. Propose "Accepted" status for community = WG. Assign tickets for curators of other communities.
Accept—Review and adjust ticket statuses in WG meeting.
Daniel proposed a basic data model (see the slides)
Rieks noted: "I'm concerned about the relation between concept and term having 1 - n multiplicities rather than n - m multiplicities. To be discussed."
Daniel proposed a process by which every stakeholder community can review and decide on the status of a term without having to necessarily agree with other communities
Daniel proposed two major requirements for our tooling
Major feature #1: Manage Curation
Anybody can propose content
Tickets are the way to change content status
Anybody can raise a ticket
Review tickets are tied to a community (scope)
Each community has its own status
Each community has its own review process and appoints one or more "curators" <== term proposed by Daniel
Curators directly update status for their community (or admins update per instructions from a curator)
Enforce some data integrity rules and workflows
Track contributors, history
Stats
Major Feature #2: Publish
Emit content per community
Timely updates (realtime desirable)
Artifacts can be styled/customized per community
Static, searchable, indexable HTML
One doc, or one doc per term / concept
Stable relative links
Programmable data (CSV or JSON) and/or API
An example is writing a script to analyze a glossary or a group of terms
Full metadata available
Contribution history
Status change history
Data and permalinks
Live data should be in the internal data model
Browsable in internal data model
https://github.com/<repo>/terms/agent-119 (term named by first EN label + concept num)
https://github.com/<repo>/concepts/119-agent (concept named by num + first EN label)
Hyperlinked to issues
Links are stable across changes in terms, definition text <== permalinks are in place, so terms can be deprecated and still resolve
Published data
Browsable in glossary data model format
Published by communities on sites under their control (they put static HTML where they want)
<glossary website>/agent.html (no concept links)
Links are versioned (not guaranteed stable across releases) <== not permalinks
Dan asked if we could use GitHub Actions to publish "live data"
Daniel said yes, that would result in the published data reflecting the live data
Drummond asked about how a version of a glossary can be "frozen" for a specific community, i.e., a spec
Daniel said that the community could fork off a version of their glossary
You can also point off to a specific version of the data at any point in time.
Specific tool proposals
terminology database = github repo
Ingest new data as Markdown documents in GitHub
Then after processing into internal data model, still keep each "table" in as a Markdown document in GitHub
to do QA for WG review of submitted data: new python script (Daniel volunteering but inviting others)
to manage internal data model
to convert from submit format to internal data model: new python script (Daniel volunteering but inviting others)
to edit and browse internal data model: modified ESSIF / GRNet tool (the one Rieks has developed)
to update status, add hyperlinks, propagate tags: new python script(s)
to emit static HTML: github action hooked up to #2 in preceding bullet
to emit programmable data (CSV, ...)—TBD
Configuring a community
Provide official name and #tag
Identify and train curators (github handles, contact info)
Configure artifacts
Configure data import
Train community on curation and publication processes
Configuring an artifact
Choose publication mechanism (output template, scripts, targets, collateral)
Setup schedule or triggers for publication
Provide selection criteria (tags, statuses)
Test run
Configuring data import
One-time, ad-hoc, ongoing?
Write and/or tune script(s)
Dry runs with cleanup
First import
Trigger for deltas
Working Group Duties
Triage tickets
Train communities
Setup communities and artifacts
Liase with communities
Approve "Accepted" status requests
Propose new data sources
Configure and maintain tool integrations
Develop output templates
Run tools for ingestion
Review data quality
Other proposals—for our actions
Publish draft glossaries from our 3 datasets
Data sanity check
Convert to internal data model
Configure artifacts and export
Assign community curators to approve
Forcing function for tools: first cut by mid Dec?
Divvy up WG work for #1 (tickets)
Figure out collaboration model outside WG meetings
Modify agenda so we spend a chunk of our time working tickets
Co-chairs
At the conclusion of his presentation, @Daniel Hardman volunteered to be a co-chair.
@Drummond Reed volunteered to join him in the interest of helping to coordinate vocabulary both at the Governance Stack WG (who will likely be the CTWG's biggest customer), across other WGs, and also across the wider ecosystem (he was one of the original co-chairs of the Decentralized Identity Foundation (DIF) Glossary WG).
Daniel and Drummond were approved as the initial co-chairs of the CTWG by consensus.
We ran 15 minutes long, so there was no time for further agenda items.
The next meeting is at the regular time in two weeks.