From OntologPSMW

Jump to: navigation, search
[ ]
Session COVID-19 KGs
Duration 1 hour
Date/Time 09 Feb 2022 17:00 GMT
9:00am PST/12:00pm EST
5:00pm GMT/6:00pm CET
Convener Ravi Sharma
Track Disaster Landscape


Pandemics and Other Disasters     (2A)

Agenda     (2B)

  • Chris Mungall and Justin Reese COVID-19 Knowledge Graphs     (2B1)
  • Chris Mungall is Department Head of Biosystems Data Science at Lawrence Berkeley National Laboratory. His research interests center around the capture, computational integration, and dissemination of biological research data, and the development of methods for using this data to elucidate biological mechanisms underpinning the health of humans and of the planet. He is particularly interested in developing and applying knowledge-based AI methods, particularly Knowledge Graphs (KGs) as an approach for integrating and reasoning over multiple types of data. Dr. Mungall and his team have led the creation of key biological ontologies for the integration of resources covering gene function, anatomy, phenotypes and the environment. He is a PI on major projects such as the Gene Ontology (GO) Consortium, the Monarch Initiative, the NCATS Biomedical Data Translator, and the National Microbiome Data Collaborative project. In collaboration with Stanford University, Dr. Mungall is leading the creation of a new BioPortal Knowledge Graph.     (2B2)
  • Justin Reese is a Computer Research Scientist at LBNL. His research focuses on using computational methods to extract actionable knowledge from biomedical and biological data, and in particular, developing performant graph machine learning algorithms to extract knowledge from biomedical knowledge graphs. Dr. Reese, along with Dr. Mungall, led the KG-Covid-19 project, which included methods for performing inference over the KG. The knowledge graphs generated by the project have been leveraged by the National Virtual Biotechnology Laboratory (NVBL) Therapeutics project and the National COVID Cohort Collaborative (N3C) for accessing integrated COVID-related data. Starting in 2022, Dr. Reese will lead the LBNL team in a new project to develop machine learning tools and best practices to improve the response to COVID-19, leveraging a large range of patient-level COVID-19 data in the National COVID Cohort Collaborative (N3C) Enclave.     (2B3)

Conference Call Information     (2C)

Attendees     (2D)

Discussion     (2E)

[12:12] Chris Mungall: Hi everyone!     (2E1)

[12:13] RaviSharma: hello and welcome     (2E2)

[12:14] RaviSharma: It seems that one of the Labs in DoD such as those doing cancer and related research at NIH at ft Dietrick also could be a stakeholder?     (2E3)

[12:18] RaviSharma: Are you also connected to the Livermore lab?     (2E4)

[12:19] Chris Mungall: We are a distinct lab, we have a few collaborators are other national labs, but I personally don't have many contacts an LLNL.     (2E5)

[12:20] Chris Mungall: If you have any contacts at Ft Dietrick would love to talk to them!     (2E6)

[12:20] Chris Mungall: Justin will talk about the NIH collaboration shortly     (2E7)

[12:23] RaviSharma: How is drug interface with ENT or nose in situ interaction simulated, similarly pulmonary and other organs?     (2E8)

[12:23] BobbinTeegarden: Can you talk more about semantic similarity calculations?     (2E9)

[12:24] Chris Mungall: KGX: -- this is a simple TSV based exchanged formal for semantic Labeled Property Graphs     (2E10)

[12:24] Chris Mungall: The semantic web community are moving in this direction with 1G     (2E11)

[12:25] Chris Mungall: Semantic Similarity: let me pull up some references..     (2E12)

[12:26] Chris Mungall: Ravi: that kind of contextualized knowledge may go beyond what we have in the KG at the moment. We may have tissue-independent representations of the underlying biological pathways but are lacking tissue-specific models     (2E13)

[12:27] Chris Mungall: Here is a good summary of semantic similarity with ontologies:     (2E14)

[12:29] Chris Mungall: This is the performant rust graph library:     (2E15)

[12:37] Chris Mungall: going back to the semantic similariry question, Fig2 in the phenomizer paper gives a sense of how this can be used to match patients to diseases/phenotypic clusters, making use of the ontology hierarchy:     (2E16)

[12:37] BobbinTeegarden: Thank you, Chris!     (2E17)

[12:39] Chris Mungall: We are not using semantic similarity explicitly with the KG embedding approach, but there are similarities -- we can measure similarity between the embedded vectors for any two concepts or individuals using standard measures like cosine similarity. This exploits a wider range of links in the KG than standard semantic similarity, but with some loss in explainability.     (2E18)

[12:45] RaviSharma: I have questions on ecosystem and drug interactions?     (2E19)

[12:50] Mike Bennett: Have you needed to do anything to reflect the distinction between data qua data, and things represented by data in the KG? Or are they indistinguishable in this scenario?     (2E20)

[12:54] Chris Ahern: Which KG database do you use?     (2E21)

[12:55] Chris Mungall: We basically do everything in memory using the ensmallen library... but we also make everything available in a triplestore and the kgs files can easily be loaded into neo4j We have been using blaze graph as the triplestore but it is no longer developed, we are looking for (open source) replacements     (2E22)

[12:56] BobbinTeegarden: If you change contexts/perspectives, can you discover new things in the same collection of data/ontology/KG (in a holonic way)?     (2E23)

[12:57] Chris Mungall: Good Q Mike, I would say we focus on knowledge over raw data, and there are upstream processes to normalize, e.g. RNAseq data -> simple X expressed-in Y triples     (2E24)

[12:58] Chris Mungall: Thanks - very interesting. We've used both Neo4j and AllegroGraph but not extensively enough to gauge their performance at scale. Your in-memory approach no doubt speaks to that.     (2E25)

[13:01] Ken Baclawski: Yes, that is the reference for KGSQL.     (2E27)

Resources     (2F)

Previous Meetings     (2G)

Next Meetings     (2H)