Machine Readable Glossary - Implementation Notes

Overview

This page is intended for developers who are looking to work on and extend the Java MRG Generator. It gives a brief tour of the key features of the project. It is not intended as a tutorial on Java or Spring Boot as there are many resources already existing on the web for both of these. It isn't a description of the MRG generation process 

MRG Generator

Basics

The project is a Spring Boot application which exposes an HTTP API and in its current implementation is expected to run locally on a curator's machine in a Docker container. The prime reasons for this is at the time of writing ToIP does not have a managed environment where the application can be deployed as a webapp so Docker is the easiest way to bundle the application and its dependent platforms without the need for lengthy and complex installations.

The application has the following "vital statistics":




Source codehttps://github.com/trustoverip/ctwg-toolkit-mrg
LanguageJava 17
FrameworksSpring Boot, Thymeleaf
Build toolMaven
Test frameworksJupiter, AssertJ


Key Classes - Processors

For brevity, fully-qualified classes are shortened as per convention, so org.trustoverip.ctwg.toolkit.mrg becomes o.t.c.t.m

For each of the classes there are a comprehensive set of unit tests that describe the expected behaviour of the component. It is strongly recommended that developers should use this as reference as well as the code and this page to determine exactly how the component should work and to see them interacting, via mocks, with the classes they depend on. Here is an example of one of the tests:


@DisplayName("""
Given a list of all items
When filters chained with and
Then should filter to leave appropriate items
""")
@Test
void testListAnd() {
List<Term> all = List.of(foo, bar, foobar, noo);
List<Term> mustHaveBoth = all.stream().filter(TermsFilter.of(TermsFilterType.tags, "foo").and(TermsFilter.of(
TermsFilterType.tags, "bar"))).collect(Collectors.toList());
assertThat(mustHaveBoth).containsExactly(foobar);
}

o.t.c.t.m.MRGWebApp

This is the main class (i.e. the @SpringBootApplication) which will run and then load the Spring application context. 

o.t.c.t.m.api.MRGApi

The application's web interface. This exposes two operations:

GET /ctwg/mrg

Returns a Thymeleaf web form (i.e. HTML) that enables the curator to enter the details needed in order to generate an MRG.

POST /ctwg/mrg

Expects o.t.c.m.t.api.MRGParams which essentially are the fields from the above form.

Once validated this invokes the MRGGlossaryGenerator.

o.t.c.t.m.processors.MRGGlossaryGenerator

This class orchestrates the glossary generation. It can be run in two modes, local or web although since moving to Spring Boot,  the local path is somewhat deprecated and could be removed (as the application can be run up in web mode locally by just running Spring)

The generate method is the key operation in the class and the log messages in this method describe the six steps:

  1. Parse the Scope Administration File
  2. Resolve local and remote scopes
  3. Create the terminology section of the MRG
  4. Parse terms from the local scope
  5. Parse (and filter) terms from the remote scope
  6. Write the MRG to file

The following classes support the MRGGlossaryGenerator

o.t.c.t.m.connectors.GithubConnector

This uses the org.kohsuke.github-api to connect to Github in order to read the curated texts and Scope Administration files there.

o.t.c.t.m.connectors.LocalFSConnector

This class is deprecated as per the comments above, local access is no longer supported. It could be removed along with the other local run hooks.

o.t.c.t.m.processors.ModelWrangler

The Generator uses a number of Java object models to hold the data in the MRG (see below). The ModelWrangler produces and manipulates these classes on behalf of the Generator.

o.t.c.t.m.processors.YAMLWrangler

Serialises and deserialises the Java object models to an from YAML, using the Jackson libraries and in particular its YAMLFactory

o.t.c.t.m.processors.TermsFilter

This enables the terms filtering (adding and removing terms based on matching their attributes) specified in the MRG specification. This part of the specification was still under consideration when the first implementation was complete so it's likely that this area might change.

Key Classes - Data

These are the most important classes in the object model.

o.t.c.t.m.model.SAFModel

This is a Java representation of a Scope Administration File and is used by the generator as the primary reference as to which version of a scope to use, the location of the terms, the location of the glossaries etc.

o.t.c.t.m.model.Scope

A local scope which will define the tags, location of the MRG, glossary etc.

o.t.c.t.m.model.GeneratorContext

This extracts data for each of the scopes (i.e. the one local scope and zero or more remote scopes) needed to make up the MRG and holds it in a convenient object. This object is used by the Generator and its dependents to determine where to fetch and store the data needed to construct an MRG.

o.t.c.t.m.model.Term

A curated term with as it appears in the scope.

o.t.c.t.m.model.MRGEntry

A term as it appears in an MRG.

o.t.c.t.m.model.MRGModel

The Java representation of the file we will ultimately output.