Actions

Ontolog Forum

Workshop Exploring the Role of Context in Evolving an Open Knowledge Health Network and Developing a Health Knowledge Graph

Main Idea

Big Data available as centralized and distributed medical health records has immense potential to further the automation of healthcare in support of patient, stakeholder and research interests (Luo et al, 2016). One problem is the difficultly of accessing, understanding and integrating the totality of data. Part of the challenge then is how to extract this data from diverse sources and process it to make it data more appropriately open and available is usable form and to grow and refine it over time. And as part of such an effort contextualizing the data may be important for validating & understanding data, data reuse and integration as well as privacy protection. This workshop will look at some of the issues of providing needed context needed to assemble knowledge graphs. Useful contextualization of this knowledge might include clearly declared relations to extract sources, situations within which the extracted knowledge is understood, as well as temporal and provenance information for extracted knowledge.

The topic of the workshop grows out of the vision of the proposed Open Knowledge Network (OKN) effort discussed in Summit sessions. This vision, in part, is one of constructing a public knowledge space consisting of a sustainable data ecosystems for various domains including the biomedical, manufacturing, and geoscience. As part of our OKN sessions we heard about both deep and shallow or lightweight approaches that might be employed to porvide context. Given its brief history as part of private sector efforts (Google Knowledge Graphs for example) OKN is likely to start with lightweight efforts based on its stated philosophy:

Our philosophy:

  • at all points, system should enable gradual adoption, with low costs to publishers and users.

That means:

  • No heavy syntax (e.g., avoid heavy reification)
  • No required coordination between publishers
  • Limited novel languages

(from https://www.nitrd.gov/nitrdgroups/images/6/6a/OKN_Horizontal_BreakoutReport.pdf)

However, it is worth considering both points of view using real examples to gauge the opportunities and limitations of each. Along the deep semantic approaches are Microtheories [MTs] and Ontology Design Patterns [ODPs]. Within a specific MT, for example, knowledge content is logically consistent supporting safe reuse and integration. a smaller piece that capture an intuitive, but critical aspect of a domain. Another approach discussed at previous Summits might employ ODPs, small, modular ontologies, which are based on the working hypothesis that patterns are useful vehicles for encapsulating knowledge. ODPs do not attempt to systematically cover a large part of a domain and they may be build from the bottom up using real data, but also leverage existing ontologies and vocabularies along the way. They are somewhat lightweight but are usually bigger and richer than triples. Rich ontology approaches can be used to contextualize ontological knowledge, but are less often used than simpler approaches to things like metadata annotations. They should be compared as well to lighter weight efforts such as leveraging RDF to provide simple context.

Vinh Nguyen (formerly at Kno.e.sis Center, now at NLM), one of the OKN community people, will co-lead this work group along with Gary Berg_Cross to tests some lightweight design choices available for OKN in the healthcare domain. In this approach RDF triples are contextualized as RDF++ serving a metadata role to define a small set of context for RDF facts. These include representing the provenance, time, information location, trust score, which is the probability, etc. of a target fact triple in the form of metadata RDF Triple. For more on this see: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4350149/

Little work exists on contextualizing even this small view of context and it has been noted in the literature that: " few studies have focused on the formal representation of temporal constraints and temporal relationships within clinical research data in the biomedical research community (Chen et al, 2018)." The Time Event Ontology (TEO) is one effort geared toward providing temporal annotations in clinical contexts (Chen et al, 2018).

We would like to explore how this works to possibly evolve a Health KG suitable for analytic applications with modern healthcare data such as patients’ electronic health records (EHRs).

Example of a Health Issue Research Scenario

As part of OKN discussions the following was suggested as an interesting Open Knowledge Research Issue: Patient has Mutation A, Disease B. Dr asks: what other bad things can happen? Patient has Mutation A, Diseases B and C. Dr asks: what drugs are suitable? For this we would need: 1. A natural interface for medical providers, and 2. A structured database of medical knowledge from researchers (our focus)

This may be part of our discussion but we will look at a small sub-set of available information that may be assembled into a KG possibly but some sources (and associated situations which should be captured) might come from past diagnoses, hospital visits, interactions with the doctors, lab results (e.g., XRays, MRIs, and EEG results), past medications, treatment plans, post-treatment complications as well as personal reports and home measurements. There is extensive work to represent such entities in ontologies, although there exist alternative formulations. This is work that can be built on and explored with the more open OKN environment. Considering the usual sources and their documentation less work to systematically capture the context of some of these medical and health measurements. An example is to consider some attributes of recorded health information such as symptoms or diagnosis. These may have an epistemic value, such as "severe", "painful" for symptoms and "suspected" or "probable" for diagnosis. Symptoms and diagnosis evolve over time, may be updated by different agents and thus have a complex provenance. In addition situational information is needed to understand and fill in or complete. Methods for knowledge graph completion using context may involve suggesting missing entities or relations of interest, or missing types for entities.

It would seem that there are ODPs of interest in such a diverse set of RDF data and we might outline some staring with real data and current Linked Data. Relevant work that can be leveraged include Aranguren et al (2009), Martínez-Costa et al (2014), Martínez-Costa et al (2017) and Mortensen et al (2012). Application of knowledge made open might include supporting real-time monitoring for identifying health anomalies.

References

Aranguren, Mikel Egaña, et al. "Ontology Design Patterns for bio-ontologies: a case study on the Cell Cycle Ontology." BMC bioinformatics. Vol. 9. No. 5. BioMed Central, 2008. Chen, Henry W., et al. "Representation of Time-Relevant Common Data Elements in the Cancer Data Standards Repository: Statistical Evaluation of an Ontological Approach." JMIR medical Informatics 6.1 (2018).

Luo J, Wu M, Gopukumar D, Zhao Y. Big data application in biomedical research and Health Care: a literature review. Biomedical informatics insights. 2016;8:1–10. doi:10.4137/BIIII.S31559.

Martínez-Costa, Catalina, Daniel Karlsson, and Stefan Schulz. "Ontology Patterns for Clinical Information Modelling." WOP. 2014. Martínez-Costa, Catalina, and Stefan Schulz. "Validating EHR clinical models using ontology patterns." Journal of biomedical informatics 76 (2017): 124-137.

Jonathan Mortensen, Matthew Horridge, Mark A Musen, and Natalya Fridman Noy. 2012. Applications of ontology design patterns in biomedical ontologies. In AMIA