Revision as of 16:12, 3 April 2024

Draft Summit Synthesis

Overview

LLM challenges align well with Ontology capabilities
- Combining the strengths of LLMs and ontologies/knowledge graphs to overcome weaknesses of each
The Fall Series
- Discussed “hybrid systems”, provided motivation for developing them, and demonstrated applications/sandboxes based on them
- Highlighted need to keep exploring areas of collaboration, and improving both ontology and LLM development and use
- Various architectures/frameworks show different interactions between ontologies and LLMs
- Several with explicit feedback loops

Broader Thoughts

Deborah McGuinness

The Evolving Landscape: Generative AI, Ontologies, and Knowledge Graphs

AI is changing business landscape
- Necessary to understand strengths and weaknesses of LLMs
- “AI will not replace most knowledge professionals but many knowledge professionals who do not collaborate with generative AI will be replaced”
- “Generative AI explosion provides … a unique opportunity to shine and a time to rethink our methods”
LLMs are “usefully wrong” – providing information to help you think

Gary Marcus

No AGI (and no Trustworthy AI) without Neurosymbolic AI

Hypothesis: Scale is all you need
- Has been funded more than any other hypothesis in AI history and made progress
- But has failed to solve very many problems: AGI, autonomous driving, common sense, bias issues, reliability, trustworthiness, ...
- Tech leaders are starting to back away from this hypothesis
- Hubert Dreyfus: Climbing ever larger trees will not get one to the moon (early 1970s)
- [Deep learning is] a better ladder, but a better ladder doesn't necessarily get you to the moon
We still desperately need neurosymbolic AI but it won't be enough to get to AGI
- Intelligence is multi-faceted: we should not expect one-size-fits-all solutions
- Looking for a quick win is distracting us from the hard work that we actually need to do

Anatoly Levenchuk

Hybrid Reasoning, the Scope of Knowledge, and What Is Beyond Ontologies?

A cognitive system/agent is a cognitive architecture with a collection of KGs, LLMs and other knowledge representations
- Cognitive architecture refers to both a theory about the structure of the human mind and to a computational instantiation of such a theory used in the fields of artificial intelligence (AI) and computational cognitive science (https://en.wikipedia.org/wiki/Cognitive_architecture)
Where KGs are discriminative declarations of “what is in the world” and LLMs are generative
Both have roles in knowledge evolution
“Looking at LLMs as chatbots is the same as looking at early computers as calculators. We're seeing an emergence of a whole new computing paradigm, and it is very early.”

John Sowa and Arun Majumdar

Trustworthy Computation: Diagrammatic Reasoning With and About LLMs

Large language models cannot do reasoning, but find and apply reasoning patterns from training data
Important to note that “thinking in language” is only one form of reasoning
Systems developed by Permion use LLMs for summarization/synthesis
- But restrict responses based on the ontology
Combine LLMs with a “scaffolding model” (vector, matrix and tensor-based) => ontology and methods of diagrammatic reasoning based on conceptual graphs (CGs)
- Where ontology is derived/tailored to policies, rules, and specifications of the project or business

Fabian Neuhaus

Ontologies in the era of large language models – a perspective

Argument 1: Attempts to automate ontology development are based on a misunderstanding of what ontology engineers do
- Ontology engineers create consensus
Argument 2: There is no ontology hidden in the weights of the LLM
- Very good at navigating ambiguities and different perspectives
- But does not resolve ambiguities, have logical consistency or persistent ontological commitments

John Sowa

Without Ontology, LLMs are clueless

LLMs are a powerful technology, remarkably similar to a joke in 1900.
- Dump books in a machine, turn a crank, and expect a stream of knowledge to flow through the wires.
The results are sometimes good and sometimes disastrous.
- LLM methods are excellent for translation, useful for search, but unreliable for generating new combinations.
- A lawyer used them to find precedents for a legal case.
- It generated an imaginary precedent and created a citation that seemed to be legitimate.
- But the opposing lawyer found that the citation was false.
Ontology states criteria for testing the results of LLMs.
- Anything generated by LLMs is just a guess (hypothesis).
- If it's inconsistent with the ontology or with a verified database, it can be rejected as false.

A look across the industry

Kurt Cagle

Complementary Thinking: Language Models, Ontologies and Knowledge Graphs

Mapping LLMs to ontologies/KGs
- Matching LLM concepts to KG instances over specific classes such as schema.org or NIEM
- Using a RAG (Retrieval Augmented Generator) plug-in to communicate with an ontology/KG and add to the node-sets or control output transformation
- Reading Turtle, RDF-XML and JSON-LD
Mapping ontologies/KGs to LLMs
- Using URI/IRI references in data and obtaining results with those references
- Adding KG embeddings (vector space representations) to LLM training corpus

Tony Seale

How Ontologies Can Unlock the Potential of Large Language Models for Business

LLM and ontology “reinforcing feedback loop of continuous improvement”
- Using ontology/KG to place “guardrails” on LLM outputs
- Using LLMs to aid in maintenance and extension of ontology
Information as a continuous stream (~LLMs) or discrete chunks (~KGs)
- Analogy to System 1 (intuitive/instinctual) and System 2 (reasoning based) thinking

Yuan He

DeepOnto: A Python Package for Ontology Engineering with Deep Learning and Language Models

DeepOnto
Python package for ontology engineering with deep learning and LMs

Hamed Babaei Giglou

LLMs4OL: Large Language Models for Ontology Learning

Results:
- We explored LLMs potential for OL through our introduced conceptual framework, LLMs4OL.
- Extensive experiments on 11 LLMs across three OL tasks demonstrate the paradigm’s proof of concept.
- The obtained empirical results show that foundational LLMs are not sufficiently suitable for ontology construction that entails a high degree of reasoning skills and domain expertise.
- When LLMs effectively fine-tuned they just might work as suitable assistants, alleviating the knowledge acquisition bottleneck, for ontology construction.
- A codebase with detailed results is shared: https://github.com/HamedBabaei/LLMs4OL
Future:
- Still, we need to explore more recent LLMs.
- Incorporate more ontologies in this study.
- Build a benchmark dataset that considers more domains.
- Optimize three LLMs4OL tasks.

Demos of hybrid systems

Evren Sirin

Stardog Voicebox: LLM-Powered Question Answering with Knowledge Graphs

Stardog Voicebox combines LLM and graph database technology to:
- Take a description of ontology and create it
Turn natural language query into SPARQL
- Provide context for decisions and debug/repair queries
Built on:
- Open-source foundational model, MPT-30B
- Fine-tuned with ~20K SPARQL queries
- Vector embedding and search via MiniLM-L6-v2 language model

Prasad Yalamanchik

Harvest Knowledge From Language - Harness the power of Large Language Models and Semantic Technology

TextDistil
- Inputs – text documents; Outputs – NQuad files and JSON
- Models trained on domain-specific variables, and training data labeled using taxonomy
- Ontology for organization/semantics (human defined)
- Query in NL parsed to ontology concepts and used to generate query to KG
- Triples returned with provenance from ingested documents
- LLM used to summarize response

Andrea Westerinen

Populating Knowledge Graphs: The Confluence of Ontology and Large Language Models

Overview of open-source tooling to parse news articles (Deep Narrative Analysis, DNA)
- Create knowledge stores with data from text stored in RDF graphs
- Enabling aggregation of textual information within and across documents
- To efficiently compare and analyze collections of text to understand patterns, trends, …
Prompts sent to OpenAI chat completion API for:
- Narrative analysis
- Rhetorical devices and viewpoint interpretations
- Sentence analysis
- Linguistics (tense, voice, errors, …), rhetorical devices and mapping to ontology
LLM JSON responses (already mapped to the ontology) used to generate RDF
- Which is stored in graph database

Deborah McGuinness

Applications of LLMs at RPI
- Collaborative KG generation by leveraging LLMs for refinement and population (value restrictions and instances) of an existing ontology, in partnership with human
  - Enhancing wine and cheese ontology
  - But could also provide concepts that are a starting point for a new ontology, for human consideration
- LLM/KG Fact Checker (ChatBS) “sandbox” with questions submitted (multiple times) to OpenAI completion API and entity linking to Wikidata for validation

Till Mossakowski

Modular design patterns for neural-symbolic integration: refinement and combination

Neural networks can extend ontologies of structured objects: from neuro to symbolic
Ontology pre-training can improve transformer performance: from symbolic to neuro
We can beat purely symbolic and purely neural baselines
Design patterns as systematic building blocks => towards a theory of neuro-symbolic engineering
Future work: Novel neural embeddings for ontologies

Markus J. Buehler

Accelerating Scientific Discovery with Generative Knowledge Extraction, Graph-Based Representation, and Multimodal Intelligent Graph Reasoning

Navigating generated knowledge graphs can result in new scientific insights

@@ Line 81: / Line 81: @@
 * Information as a continuous stream (~LLMs) or discrete chunks (~KGs)
 ** Analogy to System 1 (intuitive/instinctual) and System 2 (reasoning based) thinking
-=== '''Evren Sirin''' ===
-''Stardog Voicebox: LLM-Powered Question Answering with Knowledge Graphs''
-* Stardog Voicebox combines LLM and graph database technology to:
-** Take a description of ontology and create it
-* Turn natural language query into SPARQL
-** Provide context for decisions and debug/repair queries
-* Built on:
-** Open-source foundational model, MPT-30B
-** Fine-tuned with ~20K SPARQL queries
-** Vector embedding and search via MiniLM-L6-v2 language model
 === '''Yuan He''' ===
 ''DeepOnto: A Python Package for Ontology Engineering with Deep Learning and Language Models''
@@ Line 104: / Line 94: @@
 ** A codebase with detailed results is shared: https://github.com/HamedBabaei/LLMs4OL
 * Future:
-* Still, we need to explore more recent LLMs.
+** Still, we need to explore more recent LLMs.
-* Incorporate more ontologies in this study.
+** Incorporate more ontologies in this study.
-* Build a benchmark dataset that considers more domains.
+** Build a benchmark dataset that considers more domains.
-* Optimize three LLMs4OL tasks.
+** Optimize three LLMs4OL tasks.
 == Demos of hybrid systems ==
+=== '''Evren Sirin''' ===
+''Stardog Voicebox: LLM-Powered Question Answering with Knowledge Graphs''
+* Stardog Voicebox combines LLM and graph database technology to:
+** Take a description of ontology and create it
+* Turn natural language query into SPARQL
+** Provide context for decisions and debug/repair queries
+* Built on:
+** Open-source foundational model, MPT-30B
+** Fine-tuned with ~20K SPARQL queries
+** Vector embedding and search via MiniLM-L6-v2 language model
 === '''Prasad Yalamanchik''' ===
 ''Harvest Knowledge From Language - Harness the power of Large Language Models and Semantic Technology''

OntologySummit2024/Synthesis: Difference between revisions

Ontolog Forum

Revision as of 16:12, 3 April 2024

Contents

Draft Summit Synthesis

Overview

Broader Thoughts

Deborah McGuinness

Gary Marcus

Anatoly Levenchuk

John Sowa and Arun Majumdar

Fabian Neuhaus

John Sowa

A look across the industry

Kurt Cagle

Tony Seale

Yuan He

Hamed Babaei Giglou

Demos of hybrid systems

Evren Sirin

Prasad Yalamanchik

Andrea Westerinen

Deborah McGuinness

Till Mossakowski

Markus J. Buehler