Actions

Ontolog Forum

Session A look across the industry, Part 2
Duration 1 hour
Date/Time 25 Oct 2023 16:00 GMT
9:00am PDT/12:00pm EDT
4:00pm GMT/5:00pm CST
Convener Andrea Westerinen and Mike Bennett

Ontology Summit 2024 Fall Series A look across the industry, Part 2

Agenda

  • Evren Sirin, Stardog CTO and lead for their new Voicebox offering
    • Title: Stardog Voicebox: LLM-Powered Question Answering with Knowledge Graphs
    • Abstract: Large Language Models (LLMs) and Generative AI technologies have caused a shift in all areas of information technology but especially for question answering use cases. Leveraging LLMs for question answering can help fully democratize enterprise analytics and data access. However, using LLMs with enterprise data bring significant challenges around security, privacy, accuracy, and explainabilty. In this talk we will present Stardog Voicebox which leverages an open-source foundational LLM to build, manage, and query knowledge graphs using ordinary language. The answers to user questions directly come from the knowledge graph providing complete traceability and access control. Stardog Voicebox combines statistical reasoning in the form of LLMs with logical reasoning in knowledge graphs providing a powerful hybrid reasoning system with a natural language interface.
    • Slides
  • Yuan He, Key contributor to DeepOnto, a package for ontology engineering with deep learning
    • Title: DeepOnto: A Python Package for Ontology Engineering with Deep Learning and Language Models
    • Abstract: Integrating deep learning techniques, particularly language models (LMs), with knowledge representations like ontologies has raised widespread attention, urging the need for a platform that supports both paradigms. However, deep learning frameworks like PyTorch and Tensorflow are predominantly developed for Python programming, while widely-used ontology APIs, such as the OWL API and Jena, are primarily Java-based. To facilitate seamless integration of these frameworks and APIs, we present DeepOnto, a Python package designed for ontology engineering with deep learning. The package encompasses a core ontology processing module founded on the widely-recognized and reliable OWL API, encapsulating its fundamental features in a more “Pythonic” manner and extending its capabilities to incorporate other essential components including reasoning, verbalization, normalization, projection, taxonomy, and more. Building on this module, DeepOnto offers a suite of tools, resources, and algorithms that support various ontology engineering tasks, such as ontology alignment and completion, by harnessing deep learning methods, primarily pre-trained LMs.
    • Slides
  • Video Recording

Conference Call Information

  • Date: Wednesday, 25 October 2023
  • Start Time: 9:00am PDT / 12:00pm EDT / 6:00pm CEST / 5:00pm BST / 1600 UTC
  • Expected Call Duration: 1 hour
  • Video Conference URL: https://bit.ly/48lM0Ik
    • Conference ID: 876 3045 3240
    • Passcode: 464312

The unabbreviated URL is: https://us02web.zoom.us/j/87630453240?pwd=YVYvZHRpelVqSkM5QlJ4aGJrbmZzQT09

Participants

Complete list of participants was not captured

Discussion

  • (Question before chat recording) How is reasoning supported in Stardog?
    • Andrea Westerinen: Stardog includes a reasoner. (One of the precursors to Stardog was the Pellet reasoner, which could be used in Protege.)
  • Ravi Sharma: How does stardog distinguish the contention between datastores APIs Vs App API's differences? Namely distinguishing any discrepencies?
    • Andrea Westerinen: I am not sure that I understand your question. The datastore APIs are REST based.
  • Andrea Westerinen: Making SPARQL more user friendly (NLP -> SPARQL) is valuable.
    • Michael DeBellis: Andrea, can you say more about NLP -> SPARQL? Is this a new spec, a book,??? Have a link?
    • Michael DeBellis: Sorry, I misread your comment. I thought you said "is available" not "is valuable" and thought it was some new paper or spec. Anyway, yes I agree this is critical to making Semantic Web get traction in the real world and shouldn't be too difficult. Taking natural language and generating SPARQL from it.
    • Andrea Westerinen: This is what Evren will talk about ... Voicebox.
  • Andrea Westerinen: Evren is discussing one example of Stardog knowledge kits - related to beers, ingredients and customers.
    • Andrea Westerinen: Plus there are inference rules that can also be used in a query.
    • Riley Moher: So are we generating a new relation whose sort constraints are determined based on semantic similarity of existing relations?
    • Andrea Westerinen: You can do a PATH analysis, but this is a discussion of an inference rule that I believe is pre-defined.
  • Todd Schneider: When asking for a customer's supplier, what are they supplying?
    • Andrea Westerinen: Any ingredient used in a product purchased by a customer
    • Andrea Westerinen: Kind of weird, but I think that the point is using inference to help with NLP translation
  • Michael DeBellis: I would like to get more detail on how you integrated vector DB with triplestore
  • Douglas Miles: What opensrc LLMs did you find adequately trained?
    • Andrea Westerinen: Evren is reporting use of MPT-30B trained by MosaicML/Databricks
    • Michael DeBellis: For open source LLMs the best place to look IMO is: https://huggingface.co/
    • Douglas Miles: So you tried several openSrc Models and only one was barely adequate?!
    • Evren Sirin: This was based on an analysis of smaller models (7B). And, this is an ongoing process.
    • Douglas Miles: I am doing some NL<->CLIF and i so far only found ChatGPT-4 adequate (CahtGPT-3.5 was absolutely a waste of time).. This is good news if MPT-30B is worth a try for this.
    • Riley Moher: Very interested in NL <-> CLIF , what is the nature of the work?
    • Evren Sirin: LLM stage is changing rapidly. There are some newcomers like Mistral that is promising. We’ve gotten comparable results with Llama 2 as well.
    • Douglas Miles: I wont be surpised if Llama X or whichever will be as good as ChatGPT-4 w/in a short time from now
    • Douglas Miles: Just a small amount of seeing how good it is showing someone they probably want to use CLIF over the overly popular ARM for KR
    • Douglas Miles: I had lots of good experience with CLIF.. It is at least expressive enough for English... whereas ARM seems not to be
    • John Sowa: CLIF is much easier to map to and from English than OWL.
    • Riley Moher: More expressive for sure
    • John Sowa: CLIF includes full FOL plus a version of HOL.
  • Todd Schneider: The results of computing an inference may or may not be materialized.
    • Evren Sirin: Yes, in theory inferences can be materialized but in Stardog we only support query-time inferencing by rewriting the user query and executing with a Datalog engine.
  • Pete Rivett: Have you considered using an enterprise-specific KG to train a specific LLM?
    • Evren Sirin: Yes, this is definitely on our roadmap. We are starting with a general-purpose LLM that can be used with any KG but more domain-specific fine-tuning will certainly improve results.
  • Ravi Sharma: Example of Vector Embeddings?
    • Yuan He: SentenceBERT / FAISS, for example.
    • Evren Sirin: We use MiniLM which is a small language model (or more correctly a sentence transformer) for creating vectors from text.
  • Bart Gajderowicz: Can you expand on the traceability aspect?
    • Evren Sirin: I tried to showcase this at the end a little. We can see the query used, the data sources and data elements that contributed to the answer, etc. Traceability and explainability is still very low-level (requires RDF and SPARQL knowledge) but that’s something we plan to tackle later.
  • Andrea Westerinen: What happens if the concept is not known to the KG?
    • Riley Moher: What if the natural language query cannot be expressed with SPARQL?
    • Evren Sirin: At this stage we simple say question cannot be answered. Our goal is to use the conversational aspects to clarify the question and/or clarify to the user that graph does not contain relevant information.
  • E S: @Evren Sirin Supply Chain usage demonstration is very useful and I think applicable in business. What are the required specs for hardware for Stardog? Would you please share the estimated costs for monthly fixed costs to maintain the whole system, ie regardless of customers usage? Thank you.
  • Jiaoyan Chen: We are happy to answer questions on DeepOnto in the ChatBox.
  • Ravi Sharma: I see verification and logical merging, embedding etc.
  • Todd Schneider: Yuan, please explain ‘subsumption restructuring’.1
  • Jiaoyan Chen: I do not fully capture in which slide this phrase happens. If it is in ontology-to-graph, it refers to our work that extracting a class hierarchy from the original ontology. There is special case: A \equiv B \conjunciton C, we will have A \subclassof B and A \subclassof C; this is to avoid placing A under owl:Thing, if we just consider the declared subsumptions of named classes for building the hierarchy. This is also the strategy of Protege for pressing the class hierarchy.
  • Gary Berg-Cross: @Yuan Can you say a bit more about how alignment between ontologies is done?
  • Jiaoyan Chen : Briefly, it fine-tunes a BERT-based binary classifier with synonyms from the ontologies to be aligned, uses the classifier to predict candidate equivalent class pairs with class labels, combines the prediction scores with lexical matching scores, and finally uses logical reasoning for consistency checking and repair (using a repairing algorithm our group developed before).
    • Ravi Sharma: In that case how is alignment taken care of?
  • Ravi Sharma: Can you tell more about OAEI?
    • Jiaoyan Chen: https://www.cs.ox.ac.uk/isg/projects/ConCur/oaei/2023/index.html
    • Hang Dong: It is an onto matching benchmarking activity running for many years.
    • Jiaoyan Chen: We placed a new Bio-ML track in OAEI which has been made for over a decade.
    • Jiaoyan Chen: Our new Bio-ML track was place in 2022, and is continuing in 2023. This track is especially developed for ML-based OM systems.
  • John Sowa: If you use LLMs to do translation from English to CLIF, the mapping is simpler and more successful.
  • silke: Can you please give a short explanation of logic repair and how it is implemented? Thanks!
    • Jiaoyan Chen: Yes. We get mappings and their scores. Briefly, the repair algorithm merges the mappings and the ontologies to infer whether they are consistent. If not, it tries to remove some mappings with lowest scores, and see whether the remaining mappings + the ontologies are consistent. If yes, it stops. This procedure is iterative. The reasoning is approximated using Propositional logics.
    • Jiaoyan Chen: More details are here: https://ceur-ws.org/Vol-1014/paper_63.pdf
    • silke: Thank you so much, very helpful!
  • John Sowa: For any mapping from NL to any other notation, you need an "echo".
    • John Sowa: Whenever you type anything in English, the system should produce an echo in English to show exactly how your input was interpreted.
    • Douglas Miles: I have asked it to translate the CLIF back to English and then told it to tell me if the original English matches it response and to modify the English->CLIF .. tis 2nd round produces much better results
    • John Sowa: If the echo is not what you wanted, you can revise your question.
  • James LOGAN: Can DeepOnto create axioms that seem to always hold true in some domain from a text corpus?
    • Jiaoyan Chen: Not yet. We now are trying to extract new concepts from text and insert them into the ontology (there are some ongoing works: https://arxiv.org/abs/2306.14704. We haven’t consider axioms, but only concepts. It’s a good idea for the future extension.
  • Todd Schneider: Please explain ‘subsumption restructuring’.
    • Yuan He: We introduced subsumption axioms between parents and children concepts of a concept target for removal.
    • James LOGAN: It seems this would require jumping to conclusions or having a way to close the world
  • Mike Bennett: Yuan is the ontology alignment etc. dependent on any specific TLO? Or can different TLOs be used as the basis for this?
    • Jiaoyan Chen: I don’t know what’s TLO, but I think not …
    • Mike Bennett: Top Level Ontology
    • Yuan He: Just depend on the input ontologies is sufficient.
    • Jiaoyan Chen: No, it does not
    • Todd Schneider: ‘Top Level Ontology’ equivalent to ‘Foundational Ontology’
  • E S: Is there any accuracy problem in building KG with other languages than English?
    • Jiaoyan Chen: DeepOnto currently is tested only for English ontologies
    • Andrea Westerinen: You need the appropriate training for LLMs. So, you have translation and translation errors.

Resources

Previous Meetings

 Session
ConferenceCall 2023 10 18A look across the industry, Part 1
ConferenceCall 2023 10 11Setting the stage
ConferenceCall 2023 10 04Overview

Next Meetings

 Session
ConferenceCall 2023 11 01Demos of information extraction via hybrid systems
ConferenceCall 2023 11 08Broader thoughts
ConferenceCall 2023 11 15Synthesis
... further results