Ontology Summit 2016: Overview of Semantic Interoperability in the GeoSciences - Thu 2016-02-25     (1)

Abstract     (1B)

  • Context: A general belief that there is a convergence on standards for interoperability components, including catalogs, vocabularies, services and information models. The application of ontologies to provide semantics for this interoperability has been an active area of research in several disciplines including the GeoSciences where it now plays a central role in tackling the problems of semantic heterogeneity.     (1B1)
  • Perception: In an era of Big Data and Big Science diverse GeoScience data are increasingly being collected across many domain areas and wide spatial scales.     (1B2)
    • Increasingly interdisciplinary work requires some way of efficiently handling system, service and data heterogeneity. Valid and formal semantics is perceived to be a way of addressing these challenges.     (1B2A)
    • The GeoSciences are broadly defined to include not only the Earth Sciences but also geographically distributed sensor networks, such as air traffic control and power grids.     (1B2B)
  • Motivation: Report on activities such as NSF’s EarthCube that feature different GeoScience domains working on semantic interoperability issues to promote understanding of the state of the art across different approaches and what challenges remain:     (1B3)
    • What are the ranges of GeoScience ontologies used across the domain?     (1B3A)
    • How have ontologies been enhanced and expanded for use and reuse?     (1B3B)
    • What issues of knowledge building and reuse have been noted?     (1B3C)
    • What new standards might be     (1B3D)

Agenda     (1C)

Proceedings     (1E)

[12:34] ConradBeaulieu: Trying to find the presentation slides.     (1E1)

[12:34] LeoObrst: I can't access BrandonWhitehead's slides: no access to google drive here, for security.     (1E2)

[12:35] BrandonWhitehead: Sorry, Leo.     (1E3)

[12:35] MarshallMa: Slides are in the agenda I can access Brandon's slides     (1E4)

[12:35] BrandonWhitehead: I can send them via email (and fix the link after the talk) if that suits?     (1E5)

[12:36] LeoObrst: @BrandonWhitehead: sure, that would work. I'm at     (1E6)

[12:37] ConradBeaulieu: Got the slides now     (1E7)

[12:38] BrandonWhitehead: @LeoObrst: sent.     (1E8)

[12:42] Mark Underwood: I also reposted Brandon's deck at     (1E9)

[12:44] BrandonWhitehead: @Mark Underwood: Thank you.     (1E10)

[12:45] Donna Fritzsche: @gary - on slide 5 - what is meant by "lack of ground truth". (I can imagine several meanings)     (1E11)

[12:46] BobbinTeegarden: Slide 8: aren't they using subclassing as decomposition? Should it be 'partOf'?     (1E12)

[12:50] Donna Fritzsche1: slide 11 is very useful - Is #1 meant to be Use cases?     (1E13)

[12:51] Gary Berg-Cross: @donna yes use cases.     (1E14)

[12:52] Mark Underwood: Twitter hashtag #ontologysummit     (1E15)

[12:53] Gary Berg-Cross: @Bobbin Yes, there are issues with these simple ontologies that take off from vocabularies and that is part of the challenge.     (1E16)

[12:53] Donna Fritzsche: I like the suggestion that the term "semantic models" be used. For me, it clears thru some of the fog.     (1E17)

[12:57] Ruth Duerr: Brandon - May I use this slide (attributed of course) in my class this evening? I am giving a lecture that talks about metadata, vocabularies, ontologies, etc. and this is a much better slide than what I have at the moment     (1E18)

[12:57] Ruth Duerr: slide 4 that was     (1E19)

[13:01] Donna Fritzsche: Curating is common for the taxonomy world - and agreed it is useful and necessary for the ontology world. (@brandon)     (1E20)

[13:09] BobbinTeegarden: @Brandon Are you aware of the work that is being done by the ICES FOundation (INternational Center for Earth Simulation)  ?     (1E21)

[13:11] BrandonWhitehead: @BobbinTeegarden: A bit yes, but perhaps I should take a deeper look?     (1E22)

[13:15] BrandonWhitehead: @Ruth: Sure. If you want the actual slide (not PDF) email me and I can send it to you -     (1E23)

[13:15] BrandonWhitehead: +1 on badges!     (1E24)

[13:16] TaraAthan: Re: scholar's contribution is data ... so which do we consider more valuable - Einstein's data or theories?     (1E25)

[13:16] BobbinTeegarden: @Brandon You could contact Bob Bishop (contact info on site), he loves to talk about it, is an integrator also.     (1E26)

[13:17] BrandonWhitehead: cheers for that @Bobbin!     (1E27)

[13:18] TerryLongstreth: @Tara - Maybe not in the conventional sense, but Einstein's theories are the Metadata.     (1E28)

[13:20] MarshallMa: @Ruth - Besides NSIDC, do you know other research data repositories that deployed     (1E29)

[13:20] LeoObrst: @RuthDuerr: fine by me; Brandon's is a composition of two of my slides.     (1E30)

[13:23] Mark Underwood: Related work by ISO TC37 was presented by Sue Ellen Wright here in '11     (1E31)

[13:26] John Sowa: The complexity of sharing data among all these tools makes the problem of natural language understanding look simple.     (1E32)

[13:26] Donna Fritzsche: What is usable data?     (1E33)

[13:28] Fran Lightsom: Are there use cases that give us a clue what we should call a data set?     (1E34)

[13:29] TerryLongstreth: @Ruth Research Data Alliance is working on a proposed standard for Data Management Plans, that should allow/encourage/force projects to answer some of her questions about data descriptions and usability.     (1E35)

[13:31] RaviSharma: Is metadata from HDF E     (1E36)

[13:32] RaviSharma: usable for search     (1E37)

[13:41] Ruth Duerr: Not all data is amenable to publishing as open linked data - example MODIS 500 meter snow data... each image is 2400*2400 array of pixels, 4 fields per pixel, 1 image per every 5 minutes ever since 2000! The metadata could be though...     (1E38)

[13:41] Donna Fritzsche: @Adila - can you give an example of a Content Pattern?     (1E39)

[13:41] BrandonWhitehead: @Adila: do you know the average, or typical, depth of the content patterns in the collection?     (1E40)

[13:44] John Sowa: @Terry These talks remind me of the statement you made many years ago: "Any one of these tools is a tremendous aid to productivity, but any two of them together will kill you."     (1E41)

[13:45] John Sowa: Can anything you have done (or seen) since then solve that problem?     (1E42)

[13:46] John Sowa: Instead of linking silos, it seems that every tool creates its own silo.     (1E43)

[13:47] Donna Fritzsche: The term "content" is used very differently in the rest of the internet world (Website design, UX, Content Strategy) - I think the term domain model, or object pattern would be more accessible to various stakeholders (@Adila)     (1E44)

[13:51] Ruth Duerr: @MarshallMa: Several DAAC's were thinking of doing it; but I don't know whether they actually did     (1E45)

[13:53] Ruth Duerr: @Fran Here is a paper on the topic of dataset definitions - Renear, A. H., Sacchi, S., & Wickett, K. M. (2010). Definitions of dataset in the scientific and technical literature. Proceedings of the American Society for Information Science and Technology, 47(1), 14. Available:     (1E46)

[13:53] Adila Krisnadhi: prototypical GeoLink data     (1E47)

[13:53] Adila Krisnadhi: Example: Peter WIebe:     (1E49)

[13:54] MarshallMa: @Ruth - Thanks. Will look forward to hear Guha's presentation on March 03 about updates of     (1E50)

[13:54] Adila Krisnadhi: @Brandon: the content patterns do not typically contain deep subsumption hierarchy     (1E51)

[13:54] Mark Underwood: @John quoting @Terry sadly hilarious. "API-first" may just be the latest desperate remedy (Tibco Mashery to the contrary)     (1E52)

[13:56] Ruth Duerr: @Fran - in my mind I think the issue is that issues of helping users discover data are being conflated with data provider's needs to get attribution/credit for the data they provide. Those concerns need to be disambiguated... We don't have good mechanisms for that at the moment - though the RDA Dynamic Citation recommendations might help here.     (1E53)

[13:56] Adila Krisnadhi: @Donna: thanks for the feedback. Indeed, the term 'content' may be understood differently. The term 'content pattern' was actually originated from Aldo Gangemi's ISWC 2005 paper, which introduced the idea of ontology design patterns in general.     (1E54)

[13:58] Donna Fritzsche: @adila a similar set of conventions can be found at: (Learning Resources and documentation). I found that this specification is very clearly documented, although the last time I checked (2013) - it was hard to find implementations of it.     (1E55)

[13:59] Donna Fritzsche: @adila thanks for the reference to the term     (1E56)

[14:00] Adila Krisnadhi: @Ruth: indeed. npt all data is published as linked data, and in this case, the choice is on the data providers whether to expose the data or the metadata as Linked Data. In most cases, providers would start with the metadata first. The consideration also depends on the exact needs of the community for data discovery.     (1E57)

[14:01] Adila Krisnadhi: @Donna: thanks!     (1E58)

[14:02] Ruth Duerr: @John Sowa: But they are silos that support a particular user communities needs... for example arcGIS supports folks for whom maps are the appropriate visualization mechanism; but arcGIS is totally useless for modelers who need gridded data at reasonable time steps (i.e., model is continuous data in time in space)     (1E59)

[14:06] LeoObrst: Have to leave. Thanks, folks!     (1E60)

[14:06] Donna Fritzsche: Thanks Leo!     (1E61)

[14:07] BrandonWhitehead: @Adilia and @Donna: IIRC, Gangemi was inspired by (or borrowed from) the notion patterns described in Christopher Alexanders A Pattern Language     (1E62)

[14:08] BrandonWhitehead: Ahem. Christopher Alexander's book titled: _A Pattern Language_     (1E63)

[14:08] Donna Fritzsche: Thanks Brandon, I am aware of this work.     (1E64)

[14:09] John Sowa: @Ruth Yes, but everybody belongs to many different communities. I use the hospital example: the surgeons, specialists, general practitioners, nurses, pharmacists, administrators, staff... and most importantly patients. How can they collaborate?     (1E65)

[14:09] Adila Krisnadhi: @Brandon: Yes.     (1E66)

[14:10] Donna Fritzsche: @John - Data Encapsulation, context -aware, self-aware objects     (1E67)

[14:12] Mark Underwood: Matthew - can u say more about the event ontologies? best prax? Domain-specific? We wrestled w/ this one last year RET IoT, also for big data (where is "the event"?) ok to email me     (1E68)

[14:12] Ruth Duerr: @John Sowa: True but while I can imagine all of those folks collaborating around a common heath status-type application with test results, historical health information, etc.; I can't imagine them all using that same application to then think about community road planning...     (1E69)

[14:13] Fran Lightsom: @Ruth -- thinking thinking -- data values getting recombined into multiple data sets as they move from acquisition through processing, analysis, re-use in other collections. "data set" almost becomes an event instead of a thing, but that's not right. Thanks for the thinking!     (1E70)

[14:14] Donna Fritzsche: @Fran - the processing of the data could indeed be recorded as an event with the output being a new data set     (1E71)

[14:15] Fran Lightsom: @Donna -- that's it, thanks!     (1E72)

[14:16] Ruth Duerr: @Fran: Actually I think you are onto something with that observation... different audiences have different needs and many repositories are now generating products on the fly based on user needs     (1E73)

[14:17] Donna Fritzsche: @Fran - in fact, preserving the provenance of the processing (the event) - is important - what software version, etc. created the new data set.     (1E74)

[14:18] Gary Berg-Cross: BTW, Marshall Ma will be one of the speakers at the 2nd GeoScience track session:     (1E75)

Title: SEM+: a tool for concept mapping in geoscience     (1E76)

Abstract: The amount of geoscience ontologies and vocabularies has been growing rapidly. The interl-connections between concepts in these ontologies and vocabularies will enable the possibility of performing data integration and identity recognition, which is crucial in developing applications that use data from multiple sources. In this presentation we will introduce a tool for performing concept mapping, called SEM+. The core of SEM+ is the Information Entropy based Weighted Similarity Model for computing semantic similarity between entities and suggesting possible linking. We will also introduce a few other techniques, such as entity grouping, that were used to improve efficiency in the process of similarity computation.     (1E77)

[14:19] MattMayernik: @Mark - here is the Event ontology I mentioned, The VIVO-ISF incorporates a few classes from this, but as I said, we are looking for something more appropriate.     (1E78)

[14:20] Mark Underwood: @Matt - thx!     (1E79)

[14:20] BrandonWhitehead: @Fran and @Donna: Does it make sense then to think of a data set orthogonally? i.e. it is an object in an object model, and an event or state in a process model. One view for the artefact itself, and another for the lineage or provenance of the artifact.     (1E80)

[14:21] Donna Fritzsche: @brandon - to me, that makes absolute sense - I think of it as one big object-oriented programming problem     (1E81)

[14:22] Mark Underwood: @Adila - Wondering if PROV-O could be used to track provenance of the data providers on your slide 5? Kind off topic from your mission, but a problem others face in many big datadomains     (1E82)

[14:23] Donna Fritzsche: @brandon - so that is where being context-aware and self-aware becomes important, with data encapsulation when called for.     (1E83)

[14:25] Adila Krisnadhi: @Mark: it is certainly possible. That's actually one item on our wishlist, which we hope to fulfill as part of the effort.     (1E84)

[14:25] Donna Fritzsche: For ruth's use cases - it might require more reaching over the boarders (silo walls) than the hospital case would require. More work would have to be done to support the overarching goals as opposed to the local goals.     (1E85)

[14:27] Fran Lightsom: A data set is a collection of data that is partially defined by the purpose for which it was collected and the process that was used to collect it?     (1E86)

[14:27] TaraAthan: Refactoring datasets to new purposes depends on there being detailed archiving of the data lineage. Since it is not possible to predict the future needs that a daataset might be put to, then the key goal seems to me to be to archive the lineage at the necessary level of detail.     (1E87)

[14:27] Mark Underwood: @Adila - Would be very much interested in that effort!     (1E88)

[14:28] Gary Berg-Cross: @John Ruth is the one who mentioned the NL and conceptual maps connection.     (1E89)

[14:29] Donna Fritzsche: @john esco effort has many languages     (1E90)

[14:29] Donna Fritzsche: backed up by committee agreement     (1E91)

[14:29] BobbinTeegarden: @John What are your thoughts on solutions?     (1E92)

[14:29] Mark Underwood: @John the problem goes beyond ontologists; it's a (perhaps The) software engineering problem     (1E93)

[14:30] Mark Underwood: Thanks, everyone!     (1E94)

[14:30] BrandonWhitehead: Thanks for listening everyone!     (1E95)

[14:31] Gary Berg-Cross: Thanks all!!     (1E96)

[14:31] Adila Krisnadhi: Thanks a lot!     (1E97)

[14:31] MattMayernik: Thanks all, and thanks Gary for the invitation!     (1E99)

[14:32] KenBaclawski: The meeting is adjourned.     (1E100)

