2020-11-23 CTWG Meeting Notes
Meeting Date
- 23 November 2020
Attendees
Main Goal of this Meeting:
Understand (and if possible decide on) Daniel Hardman's proposal for tooling and overall workflow.
Agenda
Time | Item | Lead | Notes |
1 min | Welcome & Antitrust Policy Notice | Chairs | |
2 mins | Introduction of new members | Chairs | |
2 min | Agenda review & open Action Items | Chairs | |
5 mins | Co-Chair volunteers | Chairs | |
35 mins | Presentation and discussion on tooling and workflow | ||
12 mins | Integration with Operations Team | ||
2 mins | Review of Decisions and new Action Items | Chairs | |
1 min | Next meeting | Chairs |
Recording
Presentation(s)
Documents
- File 1 - link
- File 2 - link
- File 3 - link
Notes
- Welcome and Linux Foundation antitrust policy
- Introduction of new members
- Agenda review & open Action Items
- Daniel Hardman presented his slides and recommendations about terminology tooling
- His evaluation of existing tooling (open source and commercial) is that nothing that is still maintained really fits our needs well
- We need "just enough tooling"
- Daniel proposes the following data pipeline
- Capture—receive raw data from on-off ticket or batch submission PR or script
- Scan—human sanity check to triage, catch basic issues
- Merge—commit to repo, convert to internal data model, assign permalinks, becomes publishable
- Mature—Run (semi-)automated QA. Generate tickets. Propose "Accepted" status for community = WG. Assign tickets for curators of other communities.
- Accept—Review and adjust ticket statuses in WG meeting.
- Daniel proposed a basic data model (see the slides)
- Rieks noted: "I'm concerned about the relation between concept and term having 1 - n multiplicities rather than n - m multiplicities. To be discussed."
- Daniel proposed a process by which every stakeholder community can review and decide on the status of a term without having to necessarily agree with other communities
- Daniel proposed two major requirements for our tooling
- Major feature #1: Manage Curation
- Anybody can propose content
- Tickets are the way to change content status
- Anybody can raise a ticket
- Review tickets are tied to a community (scope)
- Each community has its own status
- Each community has its own review process and appoints one or more "curators" <== term proposed by Daniel
- Curators directly update status for their community (or admins update per instructions from a curator)
- Enforce some data integrity rules and workflows
- Track contributors, history
- Stats
- Major Feature #2: Publish
- Emit content per community
- Timely updates (realtime desirable)
- Artifacts can be styled/customized per community
- Static, searchable, indexable HTML
- One doc, or one doc per term / concept
- Stable relative links
- Programmable data (CSV or JSON) and/or API
- An example is writing a script to analyze a glossary or a group of terms
- Full metadata available
- Contribution history
- Status change history
- Major feature #1: Manage Curation
- Data and permalinks
- Live data should be in the internal data model
- Browsable in internal data model
- https://github.com/<repo>/terms/agent-119 (term named by first EN label + concept num)
- https://github.com/<repo>/concepts/119-agent (concept named by num + first EN label)
- Hyperlinked to issues
- Links are stable across changes in terms, definition text <== permalinks are in place, so terms can be deprecated and still resolve
- Published data
- Browsable in glossary data model format
- Published by communities on sites under their control (they put static HTML where they want)
- <glossary website>/agent.html (no concept links)
- Links are versioned (not guaranteed stable across releases) <== not permalinks
- Dan asked if we could use GitHub Actions to publish "live data"
- Daniel said yes, that would result in the published data reflecting the live data
- Drummond asked about how a version of a glossary can be "frozen" for a specific community, i.e., a spec
- Daniel said that the community could fork off a version of their glossary
- You can also point off to a specific version of the data at any point in time.
- Live data should be in the internal data model
- Specific tool proposals
- terminology database = github repo
- Ingest new data as Markdown documents in GitHub
- Then after processing into internal data model, still keep each "table" in as a Markdown document in GitHub
- to do QA for WG review of submitted data: new python script (Daniel volunteering but inviting others)
- to manage internal data model
- to convert from submit format to internal data model: new python script (Daniel volunteering but inviting others)
- to edit and browse internal data model: modified ESSIF / GRNet tool (the one Rieks has developed)
- to update status, add hyperlinks, propagate tags: new python script(s)
- to emit static HTML: github action hooked up to #2 in preceding bullet
- to emit programmable data (CSV, ...)—TBD
- terminology database = github repo
- Configuring a community
- Provide official name and #tag
- Identify and train curators (github handles, contact info)
- Configure artifacts
- Configure data import
- Train community on curation and publication processes
- Configuring an artifact
- Choose publication mechanism (output template, scripts, targets, collateral)
- Setup schedule or triggers for publication
- Provide selection criteria (tags, statuses)
- Test run
- Configuring data import
- One-time, ad-hoc, ongoing?
- Write and/or tune script(s)
- Dry runs with cleanup
- First import
- Trigger for deltas
- Working Group Duties
- Triage tickets
- Train communities
- Setup communities and artifacts
- Liase with communities
- Approve "Accepted" status requests
- Propose new data sources
- Configure and maintain tool integrations
- Develop output templates
- Run tools for ingestion
- Review data quality
- Other proposals—for our actions
- Publish draft glossaries from our 3 datasets
- Data sanity check
- Convert to internal data model
- Configure artifacts and export
- Assign community curators to approve
- Forcing function for tools: first cut by mid Dec?
- Divvy up WG work for #1 (tickets)
- Figure out collaboration model outside WG meetings
- Modify agenda so we spend a chunk of our time working tickets
- Co-chairs
- At the conclusion of his presentation, Daniel Hardman volunteered to be a co-chair.
- Drummond Reed volunteered to join him in the interest of helping to coordinate vocabulary both at the Governance Stack WG (who will likely be the CTWG's biggest customer), across other WGs, and also across the wider ecosystem (he was one of the original co-chairs of the Decentralized Identity Foundation (DIF) Glossary WG).
- Daniel and Drummond were approved as the initial co-chairs of the CTWG by consensus.
- We ran 15 minutes long, so there was no time for further agenda items.
- The next meeting is at the regular time in two weeks.
Decisions
- Daniel Hardman and Drummond Reed were approved as initial co-chairs of the CTWG.
- The CTWG will publish draft glossaries from our initial three datasets following the process proposed by Daniel Hardman.
Action Items
- Daniel Hardman to develop a plan for implementing his proposed workflow, including how to collaborate outside bi-weekly meetings.