2024-03-05 DMRWG Meeting Notes
Meeting Date
The DMRWG meets bi-weekly on Tuesdays at 12:00-13:00 PT / 16:00-17:00 UTC. Check the ToIP Calendar for meeting dates.
Zoom Recording & supporting material
Attendees
Main Goal of this Meeting
Discussion on integrating data from relational and other non-vector databases into LLM AI models, via DAG and other techniques
Agenda Items and Notes (including all relevant links)
Time | Agenda Item | Lead | Notes |
5 min |
| Chairs |
|
55 mins | Discussion | All | Neil Thomson began the conversation by discussing the importance of understanding certain topics, expressing uncertainty about the extent to which Barack was prepared to discuss today. He mentioned that Anita Rao and Savida had talked about gathering some information on AI, suggesting a deeper dive into the subject might be needed, possibly in a separate session or during the AI discussion. Neil noted Wen Jing's recent travels in Europe and his involvement in writing on the trust spanning protocol, speculating on Wen Jing's current engagement with AI-related work. Neil then invited Stephen to share his opinions. Steven Milstein responded, indicating he had posted in the chat about the complexity of SQL and structured data, based on an article that caught his attention. He shared his experience with metadata and systems generating their own metadata upon content ingestion, allowing for SQL queries on such data. Steven elaborated on the usefulness of databases and discussed the nuances of querying language models, highlighting the differences in search scope and the unpredictable nature of AI-generated responses. Neil Thomson then shifted the conversation to the integration of databases, assuming these databases have a schema that defines their structure. He expressed interest in how databases could co-present or be queried for information. Steven and Neil discussed the potential of using APIs to access databases and the simplicity of using chat GPT for querying without writing any API code. The conversation took a turn towards data analytics, with Neil sharing insights from a BI analytics background. He talked about the corporate world's reliance on repeatable processes and the challenge of incorporating AI-generated insights into such a framework. Carly Huitema joined the discussion, emphasizing the human-like unpredictability of AI responses and the iterative nature of refining queries to get desired outcomes. Neil further explored the concept of a traveler profile, emphasizing the need for it to be human-understandable and verifiable. The discussion broadened to include selective disclosure of personal information, the travel industry's role in managing personal data, and the challenges of maintaining a continuous, selective disclosure framework. Burak Serdar shared his experience with Neo4j and the different approaches of using structured data to create knowledge graphs, contrasting it with how LLMs generate semantic models from language. He highlighted the conceptual gap between traditional databases and LLMs, and the potential but challenging integration of Neo4j data with LLMs to generate more insightful answers. Steven and Burak discussed the possibility of embedding Neo4j into a vector store for LLMs, speculating on the outcomes and acknowledging the conceptual differences between the two technologies. The transcript concluded with further discussion on the potential applications of Neo4j and LLMs in generating knowledge graphs and the ongoing challenges of data privacy and consent in the use of personal data for AI and analytics. |
Supporting Material:
- Overlay Capture Architecture (home page for supporting material)
- Presentation - interoperability with Layered Schemas
- JSON-LD vs. Layered JSON Schemas
- Demonstration of Layered Schema Architecture
- On Semantic Interoperability and Self-Describing Data
- Layered Schema Architecture open source site
Screenshots/Diagrams (numbered for reference in notes above)
Decisions
- Sample Decision Item
Action Items
- Sample Action Item