Actions

Ontolog Forum

Session Track B Session 2
Duration 2 hour
Date/Time Apr 12 2017 18:30 GMT
9:30am PDT/12:30pm EDT
5:30pm BST/6:30pm CEST
Convener MikeBennett and AndreaWesterinen

Ontology Summit 2017 Track B Session 2

Video Conference URL: https://bluejeans.com/768423137

Meeting ID 768423137

Meeting webpage: http://bit.ly/2olx0WY

Chatroom: http://bit.ly/2lRq4h5

Please use the chatroom above. Do not use the video teleconference chat, which is only for communicating with the moderator.

When you use the Video Conference URL above, you will be given the choice of using the computer audio or using your own telephone. Some attendees had difficulties when using the computer audio choice. If this happens to you, please leave the meeting and reenter it using the telephone choice with access code 768423137.


Abstract

Context: Background knowledge and ontologies can be used to improve machine learning model selection, processing, data preparation and results.

Perception: Machine Learning requires knowledge of the domain being examined to define relevant hypotheses, infer missing information in data, choose relevant features, etc.

Motivation: Discuss the kinds of ontologies needed for ML, and their "ground rules" and requirements.

Please also see the Track's blog page and Meeting Page

Agenda

  • Introduction
    • In the first Track B session on March 15, there were two presentations on combining ontologies with machine learning and natural language processing technologies in order to improve results. In the first case, ontologies were combined with ML to improve decision support. The benefits included improving the quality of decisions, making decisions more understandable, and adapting the decision making processes in response to changing conditions. Regarding combining ontologies with NLP processing, this was in support of digital forensics and situational awareness. Concept extraction from natural language text was improved by using an ontology to isolate the meanings/semantics of the concepts and provide “artificial intuition” into the text.
    • In the second Track B session, we want to continue discussing the use of ontologies to improve machine learning understandability and natural language processing. We have presentations on Meaning-Based Machine Learning (MBML), discussing how to get meaningful output from existing machine learning techniques, on the use of FIBO and corporate taxonomies to extract and integrate information from data warehouses, operational stores and natural language communications, and on driving knowledge extraction via the use of a semantic model/ontology.
  • Speakers
    • Courtney Falk, Infinite Machines
      • Title: The Meaning-Based Machine Learning Project
      • Slides: PDF Format
      • Abstract: Meaning-based machine learning (MBML) is a project to investigate how to get meaningful output from existing machine learning techniques. MBML builds from the Ontological Semantics Technology (OST) where natural language resources map to an ontology. An application of MBML to detecting phishing emails provides some initial experimental results. Finally, future research directions are explored.
    • Bryan Bell, Expert System
      • Title: Leveraging FIBO with Semantic Analysis to Perform On-Boarding, Know-Your-Customer (KYC) and Customer-Due-Diligence (CDD)
      • Slides: PDF Format PPTX format
      • Abstract: The Financial Industry Business Ontology (FIBO) provides a common ontology and taxonomy for financial instruments, legal entities, and related knowledge. It provides regulatory and compliance value by ensuring that a common language can be used for data harmonization and reporting purposes. This session discusses using the formal structure of FIBO and other corporate taxonomies on unstructured information in data warehouses, operational stores and natural language communications (such as news articles, research reports, customer interactions, emails, and product descriptions), in order to create new value and aid in onboarding new customers, establishing a dependable know-your-customer process and complete on-going customer due diligence processes. Financial Industry Business Ontology (FIBO) provides the common language for bridging interoperability gaps and organizing content in a consistent way.
    • Tatiana Erekhinskaya, Lymba Corporation
      • Title: Converting Text into FIBO-Aligned Semantic Triples
      • Slides: PDF Format PPTX format
      • Abstract: Ontologies are playing a major role in federating multiple sources of structured data within enterprises. However, the unstructured documents remain mostly untouched or require manual labor to be included into consolidated knowledge management process. At Lymba, we are developing a knowledge extraction tool that automatically identifies instances of concepts/classes and relations between them in the text. The extraction is driven by a semantic model or ontology. For example, using FIBO terminology, the system recognizes time/duration constraints in contracts, money values, and their meaning - transaction value, penalty, fee, etc. and links them to the parties in the contract. The extracted knowledge is represented in a form of semantic triples, which can be persisted in an RDF storage to allow integration with other sources, inference, and querying. One more useful add-on is natural language querying capability, when a query like “Find clauses with time constraints for payor” is automatically converted into semantic triples, and then into SPARQL. This talk provides an overview of Lymba’s knowledge extraction pipeline and knowledge representation framework. Semantic parsing and triple-based representation provide a bridge between semantic technologies and NLP, leveraging inference techniques and existing ontologies. We show how Lymba’s Semantic Calculus framework allows easy customization of the solution to different domains.
  • Discussion

Attendees

Proceedings

[12:22] KenBaclawski: Video Conference URL: https://bluejeans.com/768423137

[12:30] Donna Fritzsche: Hi Everyone!

[12:31] AndreaWesterinen: Hi!

[12:31] Tatiana Erekhinskaya: Hi!

[12:32] Ken Baclawski: Hello, everyone. We will start the meeting momentarily.

[12:34] Courtney Falk: Guten Tag, Kiki.

[12:34] Kiki Hempelmann: Hi, Court

[12:34] Kiki Hempelmann: Been a while!

[12:35] Courtney Falk: Indeed. I hope Texas is not too hot yet.

[12:45] KenBaclawski: I have found that Google Chrome works better than Firefox for BlueJeans.

[12:47] Mark Underwood: No love for me - Loops @ "Loading presentation" - Using Chrome

[12:48] KenBaclawski: The slides are available on the meeting page at http://ontologforum.org/index.php/ConferenceCall_2017_04_12

[12:49] Mark Underwood: Better luck w/ the standalone app

[12:50] AndreaWesterinen: Same for me, looping at "Loading presentation"

[12:57] ToddSchneider: What about phishing that use bogus URLs?

[13:00] BobbinTeegarden: One of the references has 'Fuzzy ontology' -- what is that?

[13:04] AndreaWesterinen: Thank you, Courtney! That was interesting!

[13:06] Courtney Falk: @BobbinTeegarden: This would be where you can find the Fuzzy Ontology paper: http://ieeexplore.ieee.org/document/5548416/

[13:07] Courtney Falk: Essentially it's looking to apply fuzzy logic to the structures in Ontological Semantics.

[13:11] Courtney Falk: @AndrewEsterinen: There's more work related to automatically generating text meaning representations. That's work belonging to Sergei Nirenburg, Marge McShane, and Stephen Beale (among others). A lot of that work was done at University of Maryland - Baltimore County. Their team is now at Rensselaer.

[13:16] MikeBennett: www.intelligenceapi.com is the live demo site which people are welcome to visit.

[13:18] MikeBennett: Idea: could we use this site to analyze the chat logs from these sessions, as an aid towards the Summit outcomes (synthesis and Communique)?

[13:21] TerryLongstreth: An anecdote illustrating the vagaries of tokenising meaning: Shortly after arriving in Germany, with a limited knowledge of the language, we requested a 'hund-sack' to collect our left-overs. The waitress (perhaps consistent with local custom) was unfamiliar with the notion of scraping plates into packages to carry home, and couldn't parse our phrase at all. She blushed brightly, rushed away, and returned shortly with the manager (who spoke pretty good English). He declared that the restaurant only served normal meats, pork, chicken, rabbit, beef, etc, and never Dog, and had no idea what we wanted with the canine scrotum anyway.

[13:22] MikeBennett: Question: The system identifies that Monaco is a place in Europe. Would a knowledge base with countries etc. enable it to identify that Monaco is a city state?

[13:28] BobbinTeegarden: Does the engine learn more and more contexts as it is used, or do additional 'contexts' have to be manually added?

[13:29] MikeBennett: The word "bear" can also apply to a kind of market sentiment in finance (bear market versus bull market). Unfortunately we don't (yet) have market sentiment concepts in FIBO but we should do one day.

[13:30] TerryLongstreth: @mike: what about 'grin and bear it?'

[13:33] MikeBennett: @Terry I guess there may be some etymological relation between the verb "to bear" (which I see has a separate and opposite meaning as exemplified in US sport "to bear down" versus "to put up with"). A bear market is one where people are a bit afraid to invest.

[13:33] NancyWiegand: On slide 9, how does Cogito know the context of each term, such as 'stock'? It doesn't seem to come from the verb itself because the existing verbs could be interchanged. Note, I don't have audio so perhaps this was stated.

[13:42] AndreaWesterinen: Building on Nancy's question, it seems that an initial parse of the sentence makes a big difference. Is the sentence first parsed or is it semantically analyzed for parse options?

[13:48] MikeBennett: Tatiana makes a good ontological point about fees, penalties etc. being "roles" of monetary amounts. The semantics of roles and the contexts in which they exist (e.g. the contract) is important in finance.

[13:52] Bryan Bell: Bobbi - yes.

They can be manually added or automatically added by the analysis system with or without a SME..

[13:55] Bryan Bell: Terry: there are 22 versions of "bear" within the ontology. Your version is included.

[13:56] MikeBennett: @Tatiana what is a URL for the web page the public can play with? (the root of the URI you are showing just goes to some kind of holding page)

[13:56] Bryan Bell: NancyWiegand - slide 9: the version of the term being analyzed is derived from a combination of linguistic analysis and the language ontology.

[13:58] Bryan Bell: @AndreaWesterinen - the linguistic analysis is step one, and is an important key to correct disambiguation.

[13:58] AndreaWesterinen: @Tatiana, How would you later distinguish "cheese" as a food, from the acronym for starter Heroin?

[14:00] BobbinTeegarden: @Tatiana, Is there a way to visualize the ontology in a graph, or is it in rdf or owl so you could visualize it in another tool?

[14:03] Mark Underwood: To any speaker: Anyone want to speculate how blockchain might interact with ontologies e.g., distributed ontologies, with contractually arrived at information sharing/hiding ?

[14:06] MikeBennett: Indeed the use of street names for drugs is intended to mask the nature of what is being talked about, to normal human speakers.

[14:09] Tatiana Erekhinskaya: lymba.com

[14:09] Courtney Falk: @Mark Underwood: I'm learning about blockchain at the moment. But I'm not sure there are any compelling applications of blockchain to ontologies. Short version: I don't think there's anything blockchain offers to these problems that other systems don't already do better.

[14:11] Mark Underwood: Sorry, not on the bridge

[14:12] TerryLongstreth: Bryan: I'd express that aa "The token 'bear' has 22 manifestations". How many concepts and how unambiguously are the concepts described? (I'd guess some represent intersections or ambiguous overlaps among several concepts)

[14:12] MikeBennett: For further general questions to the 3 speakers please cllick on the Raise Hand button to your right

[14:14] Mark Underwood: OK, thx

[14:14] Mark Underwood: FYI It's very big in fintech already, because it removes certs, which are highly tedious!

[14:15] Courtney Falk: The technical, linguistic term for situations where you have multiple meanings associated with one word/token is "polysemy" if you want to dig deeper.

[14:16] Courtney Falk: @Mark Underwood: Yep. There are hidden dangers yet. See the case of the DAO distributed investment fund that got robbed when someone found a logical flaw in the way that transactions were applied.

[14:19] Mark Underwood: Few papers on this (blockchain + ontology) topic, not fully satisfying https://semanticblocks.wordpress.com/

[14:21] Mark Underwood: I only mention this as we're in the FIBO space. . .."A first effort to standardize this technology is the BLONDiE (Blockchain Ontology with Dynamic Extensibility) ontology. This OWL ontology can be used to express in RDF different fields of the structures of Ethereum or Bitcoin. It can also be extended to cover other Blockchain technologies. In addition, BLONDiE being OWL has the ability to make explicit knowledge available"

[14:21] MikeBennett: When people tell jokes they are deliberately trying to confound the kinds of mechanisms we are talking about here (disambiguation of words).

[14:22] Courtney Falk: @MikeBennett: This is another case of Victor Raskin appearing. He has a body work devoted to humor if you look him up.

[14:22] AndreaWesterinen: @Tatiana, lymba.com just takes you to a landing page. Where do you go from there? I clicked on Demos and then the text analytics tool and browse our general ontology, but the pages won't load.

[14:22] Courtney Falk: "Generalized Theory of Verbal Humor" I think it's called. Kiki can actually speak more to it.

[14:23] MikeBennett: Terry asks how the various systems are able to identify when they are dealing with a neologism

[14:23] Tatiana Erekhinskaya: @AndreaWesterinen Please try https://www.lymba.com/text-analytics-demos.html

[14:24] MikeBennett: Courtney: Neologism = "unattested input"

[14:24] AndreaWesterinen: @Tatiana, I got there, but the links to the tool or browsing the ontology open new tabs which don't load anything.

[14:24] AndreaWesterinen: Tried both Firefox and Chrome

[14:25] MikeBennett: Tatiana: similar problem to merging graphs / ontologies together; take account of context in text or in graph.

[14:26] MikeBennett: Bryan: New coinages enter the language all the time, versus when people try to hide what's talked about in e.g. drugs.

[14:27] AndreaWesterinen: @Mike, We are running out of time.

[14:27] ToddSchneider: Couldn't neologisms be used to identify potential illicit activities?

[14:27] MikeBennett: @Mark Blockchain is really important in the FIBO / finance space, where we are exploring use of FIBO as conceptual ontology for specifying smart contracts. Whether there is overlap with NLP is another question.

[14:27] AndreaWesterinen: Maybe we could move the discussion to email??

[14:27] Courtney Falk: @MikeBennett: Correct. Thanks. I was also thinking about how to include things like novel senses of "cheese" appearing when we already have a "cheese" lexeme described.

[14:29] TerryLongstreth: Thanks all.

[14:29] MikeBennett: We can leave this window going for a bit...

[14:29] Tatiana Erekhinskaya: @AndreaWesterinen just aked several people around the globe check it, they say it works. I would say Chrome is better. Please let me know if you want to share the screen

[14:30] Mark Underwood: thanks all! #ontologysummit if you socialize on Twitter

[14:30] TerryLongstreth: I'm still on Windows XP and only Chrome supports Bluejeans

[14:30] AndreaWesterinen: @Tatiana, Maybe my work location is blocking it. :-( I will try from home tonight.

[14:31] AndreaWesterinen: @Terry, XP??!!!

[14:31] MikeBennett: Thanks to all our speakers for excellent presentations and insights, and to participants for insightful questions and conversation.

[14:31] Tatiana Erekhinskaya: @AndreaWesterinen, May I please contact you and check whether it worked?

[14:31] AndreaWesterinen: @Mike +1

[14:32] TerryLongstreth: I'd have to buy (and configure) a whole new computer. That's a stretch for my pension.

[14:50] KenBaclawski: @TerryLongstreth: There is a BlueJeans app that is available for android and iphone.

Resources

Video Recording

Previous Meetings

... further results