Actions

Ontolog Forum

Session Demos of information extraction via hybrid systems
Duration 1 hour
Date/Time 1 Nov

2023 16:00 GMT

9:00am PDT/12:00pm EDT
4:00pm GMT/5:00pm CET
Convener Andrea Westerinen and Mike Bennett

Ontology Summit 2024 Fall Series Demos of information extraction via hybrid systems

Agenda

  • Andrea Westerinen, Creator of DNA, Deep Narrative Analysis
    • Title: Populating Knowledge Graphs: The Confluence of Ontology and Large Language Models
    • Abstract: Ontology-based Knowledge Graphs (KGs) stand at the forefront of semantic data representation, providing structured views of the data in complex domains. Traditionally, populating these KGs from unstructured text involved convoluted natural language analyses and custom code, but the environment has changed with the use of Large Language Models (LLMs). This talk explores one use case - the population of a KG from news articles. The evolution of the DNA application from employing spaCy APIs to OpenAI is described, and the current (open-source) implementation discussed. Implementation issues such as sourcing the data, LLM prompts, mapping the LLM responses onto the ontology, and populating the knowledge graph are overviewed.
    • Slides
  • Prasad Yalamanchi, Lead Semantics CTO
    • Title: Harvest Knowledge From Language - Harness the power of Large Language Models and Semantic Technology
    • Abstract: Language (both text and voice) holds much of the accessible knowledge to humans. It is also the best store of the collective human knowledge. Historically, accessing this knowledge, was manual and up until recent times has progressed to varying degrees of semi-automated methods! But, with the advent of Language Models and particularly Large Language Models in the last couple of years, a fully automated access to knowledge carried in language is now becoming a reality! TextDistil, the software product from Lead Semantics applies LLMs and Ontologies to extract computable knowledge in the form of RDF triples from Text.
    • Slides
  • Video Recording

Conference Call Information

  • Date: Wednesday, 1 November 2023
  • Start Time: 9:00am PDT / 12:00pm EDT / 5:00pm CET / 4:00pm GMT / 1600 UTC
    • Note that Daylight Saving Time has ended in Europe but not in the US or Canada.
    • ref: World Clock
  • Expected Call Duration: 1 hour
  • Video Conference URL: https://bit.ly/48lM0Ik
    • Conference ID: 876 3045 3240
    • Passcode: 464312

The unabbreviated URL is: https://us02web.zoom.us/j/87630453240?pwd=YVYvZHRpelVqSkM5QlJ4aGJrbmZzQT09

Participants

Complete list of participants was not captured

Discussion

  • Ravi Sharma: What is the normal accuracy and does the accuracy of triples vary by language?
    • Ravi Sharma: Namely would it just depend on the language only or on domain concept would affect the KG?
  • Michael DeBellis: Is the ontology COMPLETELY created from the Corpus, or do you start from a foundation ontology and extend it based on the Corpus?
  • Ravi Sharma: You implied preprocessing and semantic understanding by humans before the KG is generated? how much effort is it?
  • Ravi Sharma: Is there a possibility to reduce the duration by compromising the accuracy somewhat?
  • Susanne Vejdemo: I understand we have a KG constructed from the unstructured docs. And then there’s translation of your query to triples? I am a bit uncertain where the LLM comes into this?
    • Janet Singer: My question as well — how exactly does the LLM come in?
    • Andrea Westerinen: I will address this in my presentation, but can’t answer for Prasad.
    • Prasad Yalamanchi: LLM is coming in multiple places in the TextDistil pipeline. Once at the final summary string of the result items. It comes in building the KG, as well
    • Bart Gajderowicz: Prasad, what does the LLM do in building the KG?
  • Ravi Sharma: Can it show the visuals during progress such as the KG?
  • Douglas Miles: Chatgpt-3.5 in some instances can work well enough to be used ovber ChatGPT-4 ?
    • Andrea Westerinen: It MAY, but I have found profound differences. Linguistic analysis is much better in 4
    • Douglas Miles: ChatGPT-3.5 can be so much faster with its return results.. I've considered running both to see when 3.5 was sufficient.. admnittely mostly it isn't.. but "convert this to owl" often is "acceptable"
    • Andrea Westerinen: @Douglas Miles Not sure that I agree about acceptability.
    • Douglas Miles: ok true.. I mean 3.5 cant even begin to convert to CLIF of CycL .. whereas it at least tried with RDF/OWL
  • Ravi Sharma: If there are no human interventions, how much is the KG affected?
    • Michael DeBellis: Ravi, that's my question as well. IMO it is usually better to have a person in the loop because creating a well designed ontology completely from a Corpus seems like the resulting ontology may not be well designed. That's why I asked the question about starting from a basic ontology and then extending that ontology, rather than creating the entire ontology from scratch
  • Ravi Sharma: Prasad - If you do two such exercises, is the result the same/repeatable?
    • Prasad Yalamanchi: There is a language interpreatation to map the query string to the Ontology.
    • Prasad Yalamanchi: SO, if two queries (exercises as you mentioned) result in the same interpretation, then the final answers will be the same
  • Douglas Miles: Something that impresses me and is unique about Andrea's work (even year or two ago.. ) ... She actually supports full modality representations in RDF-ish languages.. Stuff that normally I would only dare to use CLIF to represent!
  • Ravi Sharma: Are there similarities to the rhetoric possibilities of metaphor, context, explanation, etc to improve your results?
    • Andrea Westerinen: I am not sure what is being asked. I am exposing the use of rhetorical devices to help readers understand how the text might be affecting their interpretations of it.
  • Ravi Sharma: Andrea does ML or AI enter this exercise? And results you showed, if so where?
  • Ravi Sharma: I mean what ML and learning sets were used in OpenAI
    • Andrea Westerinen: OpenAI's complete technology stack is not disclosed but their website says "We build our generative models using a technology called deep learning, which leverages large amounts of data to train an AI system to perform a task."
  • Mark Underwood: Excellent presentations and important work for the ontology community
  • Sundos Al Subhi: Thank you all!! Great information.
  • Janet Singer: Excellent presentations — Looking forward to seeing these ideas integrated in the future session(s)
  • Douglas Miles: (there is no question that the KR Andrea is doing is rock solid!) Here is my question though: Are any of the RDF reasoners good enough to do the reasoning/query that Andrea expects?
    • Andrea Westerinen: @Douglas Miles Yes, I use Stardog. Also allows use of Voicebox which encode NL queries in SPARQL!
  • Dan (Telicent): Thank you for your presentations 🙂
  • Zefi Kavvadia : thank you!
  • Mariusz (Telicent) : Thank you.

Resources

Previous Meetings

 Session
ConferenceCall 2023 10 25A look across the industry, Part 2
ConferenceCall 2023 10 18A look across the industry, Part 1
ConferenceCall 2023 10 11Setting the stage
... further results

Next Meetings

 Session
ConferenceCall 2023 11 08Broader thoughts
ConferenceCall 2023 11 15Synthesis
ConferenceCall 2024 02 21Overview
... further results