Actions

ConferenceCall 2024 03 20: Difference between revisions

Ontolog Forum

 
(4 intermediate revisions by the same user not shown)
Line 24: Line 24:
** Abstract: Reference ontologies play an essential role in organising knowledge in the life sciences and other domains. They are built and maintained manually. Since this is an expensive process, many reference ontologies only cover a small fraction of their domain. We develop techniques that enable the automatic extension of the coverage of a reference ontology by extending it with entities that have not been manually added yet. The extension shall be faithful to the (often implicit) design decisions by the developers of the reference ontology. While this is a generic problem, our use case addresses the Chemical Entities of Biological Interest (ChEBI) ontology with classes of molecules, since the chemical domain is particularly suited to our approach. ChEBI provides annotations that represent the structure of chemical entities (e.g., molecules and functional groups).<br/>We show that classical machine learning approaches can outperform ClassyFire, a rule-based system representing the state of the art for the task of classifying new molecules, and is already being used for the extension of ChEBI. Moreover, we develop RoBERTa and Electra transformer neural networks that achieve even better performance. In addition, the axioms of the ontology can be used during the training of prediction models as a form of semantic loss function. Furthermore, we show that ontology pre-training can improve the performance of transformer networks for the task of prediction of toxicity of chemical molecules. Finally, we show that our model learns to focus attention on more meaningful chemical groups when making predictions with ontology pre-training than without, paving a path towards greater robustness and interpretability. This strategy has general applicability as a neuro-symbolic approach to embed meaningful semantics into neural networks.
** Abstract: Reference ontologies play an essential role in organising knowledge in the life sciences and other domains. They are built and maintained manually. Since this is an expensive process, many reference ontologies only cover a small fraction of their domain. We develop techniques that enable the automatic extension of the coverage of a reference ontology by extending it with entities that have not been manually added yet. The extension shall be faithful to the (often implicit) design decisions by the developers of the reference ontology. While this is a generic problem, our use case addresses the Chemical Entities of Biological Interest (ChEBI) ontology with classes of molecules, since the chemical domain is particularly suited to our approach. ChEBI provides annotations that represent the structure of chemical entities (e.g., molecules and functional groups).<br/>We show that classical machine learning approaches can outperform ClassyFire, a rule-based system representing the state of the art for the task of classifying new molecules, and is already being used for the extension of ChEBI. Moreover, we develop RoBERTa and Electra transformer neural networks that achieve even better performance. In addition, the axioms of the ontology can be used during the training of prediction models as a form of semantic loss function. Furthermore, we show that ontology pre-training can improve the performance of transformer networks for the task of prediction of toxicity of chemical molecules. Finally, we show that our model learns to focus attention on more meaningful chemical groups when making predictions with ontology pre-training than without, paving a path towards greater robustness and interpretability. This strategy has general applicability as a neuro-symbolic approach to embed meaningful semantics into neural networks.
** Bio: Till Mossakowski is a professor of theoretical computer science at Otto-von-Guericke University of Magdeburg, Germany. He has co-designed the distributed ontology, model and specification language DOL, as well as the corresponding Heterogeneous Tool Set. His research interests are logic, knowledge representation, semantics, and neural-symbolic integration, as well as applications in energy network simulation models, chemistry and material sciences.
** Bio: Till Mossakowski is a professor of theoretical computer science at Otto-von-Guericke University of Magdeburg, Germany. He has co-designed the distributed ontology, model and specification language DOL, as well as the corresponding Heterogeneous Tool Set. His research interests are logic, knowledge representation, semantics, and neural-symbolic integration, as well as applications in energy network simulation models, chemistry and material sciences.
** [https://bit.ly/3Tzib0u Video Recording]


== Conference Call Information ==
== Conference Call Information ==
Line 43: Line 44:
* Lotti Tu
* Lotti Tu
* [[MichaelGruninger|Michael Gr&uuml;ninger]]
* [[MichaelGruninger|Michael Gr&uuml;ninger]]
* [[PhilJackson|Phil Jackson]]
* Phil Jackson
* [[RaviSharma|Ravi Sharma]]
* [[RaviSharma|Ravi Sharma]]
* Riley Moher
* Riley Moher
Line 50: Line 51:


== Discussion ==
== Discussion ==
[12:11] Ravi Sharma: Are symbols of chemistry so well understood that it can not only identify the chemical but also whether these are allowed based on valence etc?
[12:15] Ravi Sharma: Till the classical is more than training set as it incorporates rules of chemistry as well?
[12:21] Ravi Sharma: where do you use reasoners?
[12:22] Ravi Sharma: so AI enhances deep learning further but does it use transformers?
[12:23] John Sowa: For many important applications, 99% accuracy is unacceptable.  In finance, for example, accuracy to a fraction of a cent is an absolute requirement.
[12:25] John Sowa: How can formal deduction be used to ensure absolute consistency with a database and a precise ontology?
[12:25] Ravi Sharma: Bidirectional learning with filtering is a great feature but how much is improvement due to this?
[12:28] Phil Jackson: Can the approach be applied to discovery of patterns of information within DNA molecules?
[12:28] Ravi Sharma: is the pretraining improvement because it learns the chemical rules?
[12:28] Douglas R. Miles: Can a Transformer be used store clause rules ?
[12:29] Douglas R. Miles: store and suggest rules that is*
[12:41] Todd Schneider: John, wouldn’t it be the case that LLMs and neural networks should not be used for some applications?
[12:46] Michael Gr&uuml;ninger: No — the point is that the models of the axioms are isomorphic  to graphs (as mathematical structures)
[12:47] Michael Gr&uuml;ninger: It’s not the language that is graphical
[12:48] Todd Schneider: Michael, is your response to John’s comment about ‘linear’ languages’ and graphs?
[12:50] Michael Gr&uuml;ninger: Yes
[12:57] John Sowa: Every graph can be linearized, and every linear notation can be mapped to a graph.
[12:58] John Sowa: But there are an enormous number of different kinds of graphs and linear notations.
[12:58] John Sowa: For different applications and different problems or questions, different representations may be better than others.
[12:59] John Sowa: Fundamental principle:  there is no single paradigm that is ideal for all possible applications.
[13:00] John Sowa: Two paradigms are better than one, and multiple paradigms are even better.
[13:00] Michael Gr&uuml;ninger: A Molecular Structure Ontology for Medicinal Chemistry Chui and Gr&uuml;ninger FOIS 2016
[13:00] Riley Moher: Does the attention in the classifier reveal any kind of features or structural properties of graphs? And would you expect this to be correlated with axioms of the ontology?
[13:02] Fabian Neuhaus: @Michael: we also have translated the structural representation in ChEBi in FOL. Unfortunately, the many non-leaf nodes in ChEBI do not contain structural definitions, and of those many are incomplete. Thus, the translation to FOL lead to wrong axioms / definition.
[13:04] Lotti Tu: thank you for the great talk!
[13:05] Riley Moher: Thank you, very interesting
[13:06] Bev Corwin: Thank you


== Resources ==
== Resources ==
* [https://bit.ly/3TDfNqM Slides]
* [https://bit.ly/3Tzib0u Video Recording]


== Previous Meetings ==
== Previous Meetings ==

Latest revision as of 01:51, 21 March 2024

Session Foundations and Architectures
Duration 1 hour
Date/Time 20 Mar 2024 16:00 GMT
9:00am PDT/12:00pm EDT
4:00pm GMT/5:00pm CST
Convener Ravi Sharma

Ontology Summit 2024 Foundations and Architectures

Agenda

  • Till Mossakowski Neuro-symbolic integration for ontology-based classification of structured objects Slides
    • Abstract: Reference ontologies play an essential role in organising knowledge in the life sciences and other domains. They are built and maintained manually. Since this is an expensive process, many reference ontologies only cover a small fraction of their domain. We develop techniques that enable the automatic extension of the coverage of a reference ontology by extending it with entities that have not been manually added yet. The extension shall be faithful to the (often implicit) design decisions by the developers of the reference ontology. While this is a generic problem, our use case addresses the Chemical Entities of Biological Interest (ChEBI) ontology with classes of molecules, since the chemical domain is particularly suited to our approach. ChEBI provides annotations that represent the structure of chemical entities (e.g., molecules and functional groups).
      We show that classical machine learning approaches can outperform ClassyFire, a rule-based system representing the state of the art for the task of classifying new molecules, and is already being used for the extension of ChEBI. Moreover, we develop RoBERTa and Electra transformer neural networks that achieve even better performance. In addition, the axioms of the ontology can be used during the training of prediction models as a form of semantic loss function. Furthermore, we show that ontology pre-training can improve the performance of transformer networks for the task of prediction of toxicity of chemical molecules. Finally, we show that our model learns to focus attention on more meaningful chemical groups when making predictions with ontology pre-training than without, paving a path towards greater robustness and interpretability. This strategy has general applicability as a neuro-symbolic approach to embed meaningful semantics into neural networks.
    • Bio: Till Mossakowski is a professor of theoretical computer science at Otto-von-Guericke University of Magdeburg, Germany. He has co-designed the distributed ontology, model and specification language DOL, as well as the corresponding Heterogeneous Tool Set. His research interests are logic, knowledge representation, semantics, and neural-symbolic integration, as well as applications in energy network simulation models, chemistry and material sciences.
    • Video Recording

Conference Call Information

  • Date: Wednesday, 20 March 2024
  • Start Time: 9:00am PDT / 12:00pm EDT / 5:00pm CET / 4:00pm GMT / 1600 UTC
    • ref: World Clock
    • Note: The US and Canada are on Daylight Saving Time while Europe has not yet changed.
  • Expected Call Duration: 1 hour

The unabbreviated URL is: https://us02web.zoom.us/j/87630453240?pwd=YVYvZHRpelVqSkM5QlJ4aGJrbmZzQT09

Participants

Discussion

[12:11] Ravi Sharma: Are symbols of chemistry so well understood that it can not only identify the chemical but also whether these are allowed based on valence etc?

[12:15] Ravi Sharma: Till the classical is more than training set as it incorporates rules of chemistry as well?

[12:21] Ravi Sharma: where do you use reasoners?

[12:22] Ravi Sharma: so AI enhances deep learning further but does it use transformers?

[12:23] John Sowa: For many important applications, 99% accuracy is unacceptable. In finance, for example, accuracy to a fraction of a cent is an absolute requirement.

[12:25] John Sowa: How can formal deduction be used to ensure absolute consistency with a database and a precise ontology?

[12:25] Ravi Sharma: Bidirectional learning with filtering is a great feature but how much is improvement due to this?

[12:28] Phil Jackson: Can the approach be applied to discovery of patterns of information within DNA molecules?

[12:28] Ravi Sharma: is the pretraining improvement because it learns the chemical rules?

[12:28] Douglas R. Miles: Can a Transformer be used store clause rules ?

[12:29] Douglas R. Miles: store and suggest rules that is*

[12:41] Todd Schneider: John, wouldn’t it be the case that LLMs and neural networks should not be used for some applications?

[12:46] Michael Grüninger: No — the point is that the models of the axioms are isomorphic to graphs (as mathematical structures)

[12:47] Michael Grüninger: It’s not the language that is graphical

[12:48] Todd Schneider: Michael, is your response to John’s comment about ‘linear’ languages’ and graphs?

[12:50] Michael Grüninger: Yes

[12:57] John Sowa: Every graph can be linearized, and every linear notation can be mapped to a graph.

[12:58] John Sowa: But there are an enormous number of different kinds of graphs and linear notations.

[12:58] John Sowa: For different applications and different problems or questions, different representations may be better than others.

[12:59] John Sowa: Fundamental principle: there is no single paradigm that is ideal for all possible applications.

[13:00] John Sowa: Two paradigms are better than one, and multiple paradigms are even better.

[13:00] Michael Grüninger: A Molecular Structure Ontology for Medicinal Chemistry Chui and Grüninger FOIS 2016

[13:00] Riley Moher: Does the attention in the classifier reveal any kind of features or structural properties of graphs? And would you expect this to be correlated with axioms of the ontology?

[13:02] Fabian Neuhaus: @Michael: we also have translated the structural representation in ChEBi in FOL. Unfortunately, the many non-leaf nodes in ChEBI do not contain structural definitions, and of those many are incomplete. Thus, the translation to FOL lead to wrong axioms / definition.

[13:04] Lotti Tu: thank you for the great talk!

[13:05] Riley Moher: Thank you, very interesting

[13:06] Bev Corwin: Thank you

Resources

Previous Meetings

 Session
ConferenceCall 2024 03 13LLMs, Ontologies and KGs
ConferenceCall 2024 03 06LLMs, Ontologies and KGs
ConferenceCall 2024 02 28Foundations and Architectures
... further results

Next Meetings

 Session
ConferenceCall 2024 03 27Foundations and Architectures
ConferenceCall 2024 04 03Synthesis
ConferenceCall 2024 04 10Synthesis
... further results