2020-11-23 CTWG Meeting Notes

Meeting Date

23 November 2020

Attendees

Main Goal of this Meeting:

Understand (and if possible decide on) Daniel Hardman's proposal for tooling and overall workflow.

Agenda

Time	Item	Lead	Notes
1 min	Welcome & Antitrust Policy Notice	Chairs
2 mins	Introduction of new members	Chairs
2 min	Agenda review & open Action Items	Chairs
5 mins	Co-Chair volunteers	Chairs
35 mins	Presentation and discussion on tooling and workflow	Daniel Hardman
12 mins	Integration with Operations Team	David Luchuk
2 mins	Review of Decisions and new Action Items	Chairs
1 min	Next meeting	Chairs

Recording

Link

Presentation(s)

Daniel's slides

Documents

File 1 - link
File 2 - link
File 3 - link

Notes

Welcome and Linux Foundation antitrust policy
Introduction of new members
Agenda review & open Action Items
Daniel Hardman presented his slides and recommendations about terminology tooling
1. His evaluation of existing tooling (open source and commercial) is that nothing that is still maintained really fits our needs well
2. We need "just enough tooling"
3. Daniel proposes the following data pipeline
  1. Capture—receive raw data from on-off ticket or batch submission PR or script
  2. Scan—human sanity check to triage, catch basic issues
  3. Merge—commit to repo, convert to internal data model, assign permalinks, becomes publishable
  4. Mature—Run (semi-)automated QA. Generate tickets. Propose "Accepted" status for community = WG. Assign tickets for curators of other communities.
  5. Accept—Review and adjust ticket statuses in WG meeting.
4. Daniel proposed a basic data model (see the slides)
  1. Rieks noted: "I'm concerned about the relation between concept and term having 1 - n multiplicities rather than n - m multiplicities. To be discussed."
5. Daniel proposed a process by which every stakeholder community can review and decide on the status of a term without having to necessarily agree with other communities
6. Daniel proposed two major requirements for our tooling
  1. Major feature #1: Manage Curation
    1. Anybody can propose content
    2. Tickets are the way to change content status
    3. Anybody can raise a ticket
    4. Review tickets are tied to a community (scope)
    5. Each community has its own status
    6. Each community has its own review process and appoints one or more "curators" <== term proposed by Daniel
    7. Curators directly update status for their community (or admins update per instructions from a curator)
    8. Enforce some data integrity rules and workflows
    9. Track contributors, history
    10. Stats
  2. Major Feature #2: Publish
    1. Emit content per community
    2. Timely updates (realtime desirable)
    3. Artifacts can be styled/customized per community
    4. Static, searchable, indexable HTML
    5. Programmable data (CSV or JSON) and/or API
      1. An example is writing a script to analyze a glossary or a group of terms
    6. Full metadata available
      1. Contribution history
      2. Status change history
7. Data and permalinks
  1. Live data should be in the internal data model
    1. Browsable in internal data model
    2. https://github.com/<repo>/terms/agent-119 (term named by first EN label + concept num)
    3. https://github.com/<repo>/concepts/119-agent (concept named by num + first EN label)
    4. Hyperlinked to issues
    5. Links are stable across changes in terms, definition text <== permalinks are in place, so terms can be deprecated and still resolve
  2. Published data
    1. Browsable in glossary data model format
    2. Published by communities on sites under their control (they put static HTML where they want)
    3. <glossary website>/agent.html (no concept links)
    4. Links are versioned (not guaranteed stable across releases) <== not permalinks
  3. Dan asked if we could use GitHub Actions to publish "live data"
    1. Daniel said yes, that would result in the published data reflecting the live data
  4. Drummond asked about how a version of a glossary can be "frozen" for a specific community, i.e., a spec
    1. Daniel said that the community could fork off a version of their glossary
    2. You can also point off to a specific version of the data at any point in time.
8. Specific tool proposals
  1. terminology database = github repo
    1. Ingest new data as Markdown documents in GitHub
    2. Then after processing into internal data model, still keep each "table" in as a Markdown document in GitHub
  2. to do QA for WG review of submitted data: new python script (Daniel volunteering but inviting others)
  3. to manage internal data model
  4. to emit static HTML: github action hooked up to #2 in preceding bullet
  5. to emit programmable data (CSV, ...)—TBD
9. Configuring a community
  1. Provide official name and #tag
  2. Identify and train curators (github handles, contact info)
  3. Configure artifacts
  4. Configure data import
  5. Train community on curation and publication processes
10. Configuring an artifact
  1. Choose publication mechanism (output template, scripts, targets, collateral)
  2. Setup schedule or triggers for publication
  3. Provide selection criteria (tags, statuses)
  4. Test run
11. Configuring data import
  1. One-time, ad-hoc, ongoing?
  2. Write and/or tune script(s)
  3. Dry runs with cleanup
  4. First import
  5. Trigger for deltas
12. Working Group Duties
  1. Triage tickets
  2. Train communities
  3. Setup communities and artifacts
  4. Liase with communities
  5. Approve "Accepted" status requests
  6. Propose new data sources
  7. Configure and maintain tool integrations
  8. Develop output templates
  9. Run tools for ingestion
  10. Review data quality
13. Other proposals—for our actions
  1. Publish draft glossaries from our 3 datasets
  2. Divvy up WG work for #1 (tickets)
  3. Figure out collaboration model outside WG meetings
  4. Modify agenda so we spend a chunk of our time working tickets
Co-chairs
1. At the conclusion of his presentation, Daniel Hardman volunteered to be a co-chair.
2. Drummond Reed volunteered to join him in the interest of helping to coordinate vocabulary both at the Governance Stack WG (who will likely be the CTWG's biggest customer), across other WGs, and also across the wider ecosystem (he was one of the original co-chairs of the Decentralized Identity Foundation (DIF) Glossary WG).
3. Daniel and Drummond were approved as the initial co-chairs of the CTWG by consensus.
We ran 15 minutes long, so there was no time for further agenda items.
The next meeting is at the regular time in two weeks.

Decisions

Daniel Hardman and Drummond Reed were approved as initial co-chairs of the CTWG.
The CTWG will publish draft glossaries from our initial three datasets following the process proposed by Daniel Hardman.

Action Items

Daniel Hardman to develop a plan for implementing his proposed workflow, including how to collaborate outside bi-weekly meetings.