Actions

OntologySummit2024/Synthesis: Difference between revisions

Ontolog Forum

(Created page with "= Draft Summit Synthesis = == Overview == * LLM challenges align well with Ontology capabilities ** Combining the strengths of LLMs and ontologies/knowledge graphs to overcome weaknesses of each * The Fall Series ** Discussed “hybrid systems”, provided motivation for developing them, and demonstrated applications/sandboxes based on them ** Highlighted need to keep exploring areas of collaboration, and improving both ontology and LLM development and use ** Various ar...")
 
No edit summary
Line 81: Line 81:
* Information as a continuous stream (~LLMs) or discrete chunks (~KGs)
* Information as a continuous stream (~LLMs) or discrete chunks (~KGs)
** Analogy to System 1 (intuitive/instinctual) and System 2 (reasoning based) thinking
** Analogy to System 1 (intuitive/instinctual) and System 2 (reasoning based) thinking
=== '''Evren Sirin''' ===
''Stardog Voicebox: LLM-Powered Question Answering with Knowledge Graphs''
* Stardog Voicebox combines LLM and graph database technology to:
** Take a description of ontology and create it
* Turn natural language query into SPARQL
** Provide context for decisions and debug/repair queries
* Built on:
** Open-source foundational model, MPT-30B
** Fine-tuned with ~20K SPARQL queries
** Vector embedding and search via MiniLM-L6-v2 language model
=== '''Yuan He''' ===
=== '''Yuan He''' ===
''DeepOnto: A Python Package for Ontology Engineering with Deep Learning and Language Models''
''DeepOnto: A Python Package for Ontology Engineering with Deep Learning and Language Models''
Line 104: Line 94:
** A codebase with detailed results is shared: https://github.com/HamedBabaei/LLMs4OL
** A codebase with detailed results is shared: https://github.com/HamedBabaei/LLMs4OL
* Future:
* Future:
* Still, we need to explore more recent LLMs.
** Still, we need to explore more recent LLMs.
* Incorporate more ontologies in this study.
** Incorporate more ontologies in this study.
* Build a benchmark dataset that considers more domains.
** Build a benchmark dataset that considers more domains.
* Optimize three LLMs4OL tasks.
** Optimize three LLMs4OL tasks.


== Demos of hybrid systems ==
== Demos of hybrid systems ==
=== '''Evren Sirin''' ===
''Stardog Voicebox: LLM-Powered Question Answering with Knowledge Graphs''
* Stardog Voicebox combines LLM and graph database technology to:
** Take a description of ontology and create it
* Turn natural language query into SPARQL
** Provide context for decisions and debug/repair queries
* Built on:
** Open-source foundational model, MPT-30B
** Fine-tuned with ~20K SPARQL queries
** Vector embedding and search via MiniLM-L6-v2 language model
=== '''Prasad Yalamanchik''' ===
=== '''Prasad Yalamanchik''' ===
''Harvest Knowledge From Language - Harness the power of Large Language Models and Semantic Technology''
''Harvest Knowledge From Language - Harness the power of Large Language Models and Semantic Technology''

Revision as of 16:12, 3 April 2024

Draft Summit Synthesis

Overview

  • LLM challenges align well with Ontology capabilities
    • Combining the strengths of LLMs and ontologies/knowledge graphs to overcome weaknesses of each
  • The Fall Series
    • Discussed “hybrid systems”, provided motivation for developing them, and demonstrated applications/sandboxes based on them
    • Highlighted need to keep exploring areas of collaboration, and improving both ontology and LLM development and use
    • Various architectures/frameworks show different interactions between ontologies and LLMs
    • Several with explicit feedback loops

Broader Thoughts

Deborah McGuinness

The Evolving Landscape: Generative AI, Ontologies, and Knowledge Graphs

  • AI is changing business landscape
    • Necessary to understand strengths and weaknesses of LLMs
    • “AI will not replace most knowledge professionals but many knowledge professionals who do not collaborate with generative AI will be replaced”
    • “Generative AI explosion provides … a unique opportunity to shine and a time to rethink our methods”
  • LLMs are “usefully wrong” – providing information to help you think

Gary Marcus

No AGI (and no Trustworthy AI) without Neurosymbolic AI

  • Hypothesis: Scale is all you need
    • Has been funded more than any other hypothesis in AI history and made progress
    • But has failed to solve very many problems: AGI, autonomous driving, common sense, bias issues, reliability, trustworthiness, ...
    • Tech leaders are starting to back away from this hypothesis
    • Hubert Dreyfus: Climbing ever larger trees will not get one to the moon (early 1970s)
    • [Deep learning is] a better ladder, but a better ladder doesn't necessarily get you to the moon
  • We still desperately need neurosymbolic AI but it won't be enough to get to AGI
    • Intelligence is multi-faceted: we should not expect one-size-fits-all solutions
    • Looking for a quick win is distracting us from the hard work that we actually need to do

Anatoly Levenchuk

Hybrid Reasoning, the Scope of Knowledge, and What Is Beyond Ontologies?

  • A cognitive system/agent is a cognitive architecture with a collection of KGs, LLMs and other knowledge representations
    • Cognitive architecture refers to both a theory about the structure of the human mind and to a computational instantiation of such a theory used in the fields of artificial intelligence (AI) and computational cognitive science (https://en.wikipedia.org/wiki/Cognitive_architecture)
  • Where KGs are discriminative declarations of “what is in the world” and LLMs are generative
  • Both have roles in knowledge evolution
  • “Looking at LLMs as chatbots is the same as looking at early computers as calculators. We're seeing an emergence of a whole new computing paradigm, and it is very early.”

John Sowa and Arun Majumdar

Trustworthy Computation: Diagrammatic Reasoning With and About LLMs

  • Large language models cannot do reasoning, but find and apply reasoning patterns from training data
  • Important to note that “thinking in language” is only one form of reasoning
  • Systems developed by Permion use LLMs for summarization/synthesis
    • But restrict responses based on the ontology
  • Combine LLMs with a “scaffolding model” (vector, matrix and tensor-based) => ontology and methods of diagrammatic reasoning based on conceptual graphs (CGs)
    • Where ontology is derived/tailored to policies, rules, and specifications of the project or business

Fabian Neuhaus

Ontologies in the era of large language models – a perspective

  • Argument 1: Attempts to automate ontology development are based on a misunderstanding of what ontology engineers do
    • Ontology engineers create consensus
  • Argument 2: There is no ontology hidden in the weights of the LLM
    • Very good at navigating ambiguities and different perspectives
    • But does not resolve ambiguities, have logical consistency or persistent ontological commitments

John Sowa

Without Ontology, LLMs are clueless

  • LLMs are a powerful technology, remarkably similar to a joke in 1900.
    • Dump books in a machine, turn a crank, and expect a stream of knowledge to flow through the wires.
  • The results are sometimes good and sometimes disastrous.
    • LLM methods are excellent for translation, useful for search, but unreliable for generating new combinations.
    • A lawyer used them to find precedents for a legal case.
    • It generated an imaginary precedent and created a citation that seemed to be legitimate.
    • But the opposing lawyer found that the citation was false.
  • Ontology states criteria for testing the results of LLMs.
    • Anything generated by LLMs is just a guess (hypothesis).
    • If it's inconsistent with the ontology or with a verified database, it can be rejected as false.

A look across the industry

Kurt Cagle

Complementary Thinking: Language Models, Ontologies and Knowledge Graphs

  • Mapping LLMs to ontologies/KGs
    • Matching LLM concepts to KG instances over specific classes such as schema.org or NIEM
    • Using a RAG (Retrieval Augmented Generator) plug-in to communicate with an ontology/KG and add to the node-sets or control output transformation
    • Reading Turtle, RDF-XML and JSON-LD
  • Mapping ontologies/KGs to LLMs
    • Using URI/IRI references in data and obtaining results with those references
    • Adding KG embeddings (vector space representations) to LLM training corpus

Tony Seale

How Ontologies Can Unlock the Potential of Large Language Models for Business

  • LLM and ontology “reinforcing feedback loop of continuous improvement”
    • Using ontology/KG to place “guardrails” on LLM outputs
    • Using LLMs to aid in maintenance and extension of ontology
  • Information as a continuous stream (~LLMs) or discrete chunks (~KGs)
    • Analogy to System 1 (intuitive/instinctual) and System 2 (reasoning based) thinking

Yuan He

DeepOnto: A Python Package for Ontology Engineering with Deep Learning and Language Models

  • DeepOnto
  • Python package for ontology engineering with deep learning and LMs

Hamed Babaei Giglou

LLMs4OL: Large Language Models for Ontology Learning

  • Results:
    • We explored LLMs potential for OL through our introduced conceptual framework, LLMs4OL.
    • Extensive experiments on 11 LLMs across three OL tasks demonstrate the paradigm’s proof of concept.
    • The obtained empirical results show that foundational LLMs are not sufficiently suitable for ontology construction that entails a high degree of reasoning skills and domain expertise.
    • When LLMs effectively fine-tuned they just might work as suitable assistants, alleviating the knowledge acquisition bottleneck, for ontology construction.
    • A codebase with detailed results is shared: https://github.com/HamedBabaei/LLMs4OL
  • Future:
    • Still, we need to explore more recent LLMs.
    • Incorporate more ontologies in this study.
    • Build a benchmark dataset that considers more domains.
    • Optimize three LLMs4OL tasks.

Demos of hybrid systems

Evren Sirin

Stardog Voicebox: LLM-Powered Question Answering with Knowledge Graphs

  • Stardog Voicebox combines LLM and graph database technology to:
    • Take a description of ontology and create it
  • Turn natural language query into SPARQL
    • Provide context for decisions and debug/repair queries
  • Built on:
    • Open-source foundational model, MPT-30B
    • Fine-tuned with ~20K SPARQL queries
    • Vector embedding and search via MiniLM-L6-v2 language model

Prasad Yalamanchik

Harvest Knowledge From Language - Harness the power of Large Language Models and Semantic Technology

  • TextDistil
    • Inputs – text documents; Outputs – NQuad files and JSON
    • Models trained on domain-specific variables, and training data labeled using taxonomy
    • Ontology for organization/semantics (human defined)
    • Query in NL parsed to ontology concepts and used to generate query to KG
    • Triples returned with provenance from ingested documents
    • LLM used to summarize response

Andrea Westerinen

Populating Knowledge Graphs: The Confluence of Ontology and Large Language Models

  • Overview of open-source tooling to parse news articles (Deep Narrative Analysis, DNA)
    • Create knowledge stores with data from text stored in RDF graphs
    • Enabling aggregation of textual information within and across documents
    • To efficiently compare and analyze collections of text to understand patterns, trends, …
  • Prompts sent to OpenAI chat completion API for:
    • Narrative analysis
    • Rhetorical devices and viewpoint interpretations
    • Sentence analysis
    • Linguistics (tense, voice, errors, …), rhetorical devices and mapping to ontology
  • LLM JSON responses (already mapped to the ontology) used to generate RDF
    • Which is stored in graph database

Deborah McGuinness

  • Applications of LLMs at RPI
    • Collaborative KG generation by leveraging LLMs for refinement and population (value restrictions and instances) of an existing ontology, in partnership with human
      • Enhancing wine and cheese ontology
      • But could also provide concepts that are a starting point for a new ontology, for human consideration
    • LLM/KG Fact Checker (ChatBS) “sandbox” with questions submitted (multiple times) to OpenAI completion API and entity linking to Wikidata for validation

Till Mossakowski

Modular design patterns for neural-symbolic integration: refinement and combination

  • Neural networks can extend ontologies of structured objects: from neuro to symbolic
  • Ontology pre-training can improve transformer performance: from symbolic to neuro
  • We can beat purely symbolic and purely neural baselines
  • Design patterns as systematic building blocks => towards a theory of neuro-symbolic engineering
  • Future work: Novel neural embeddings for ontologies

Markus J. Buehler

Accelerating Scientific Discovery with Generative Knowledge Extraction, Graph-Based Representation, and Multimodal Intelligent Graph Reasoning

  • Navigating generated knowledge graphs can result in new scientific insights