From OntologPSMW

Jump to: navigation, search
[ ]
Session Paco Nathan
Duration 1 hour
Date/Time 04 March 2020 17:00 GMT
9:00am PST/12:00pm EST
5:00pm GMT/6:00pm CET
Convener Gary Berg-Cross
Track How


Knowledge graphs, closely related to ontologies and semantic networks, have emerged in the last few years to be an important semantic technology and research area. As structured representations of semantic knowledge that are stored in a graph, KGs are lightweight versions of semantic networks that scale to massive datasets such as the entire World Wide Web. Industry has devoted a great deal of effort to the development of knowledge graphs, and they are now critical to the functions of intelligent virtual assistants such as Siri and Alexa. Some of the research communities where KGs are relevant are Ontologies, Big Data, Linked Data, Open Knowledge Network, Artificial Intelligence, Deep Learning, and many others.     (2A)

Agenda     (2B)

Presenter:     (2B2)

Known as a "player/coach", with core expertise in data science, natural language processing, machine learning, cloud computing; 35+ years tech industry experience, ranging from Bell Labs to early-stage start-ups.     (2B3)

Co-chair Rev and JupyterCon Advisor for NYU Coleridge Initiative IBM Data Science Community Amplify Partners Recognai Primer Formerly: Director, Community Evangelism @ Databricks and Apache Spark. Cited in 2015 as one of the Top 30 People in Big Data and Analytics by Innovation Enterprise.     (2B4)

Abstract:     (2B5)

This talk explores the Rich Context project based in the Coleridge Initiative at NYU Wagner. Coleridge Initiative produces the ADRF platform, which is currently used by 50 federal, state, and local agencies in the US to provide a FedRAMP compliant environment on GovCloud for data analysis using cross-agency sensitive data. ADRF was cited as the first example of Secure Access to Confidential Data in the final report of the Commission on Evidence-Based Policymaking.     (2B6)

Rich Context is a public knowledge graph complement to the ADRF. Its goal is to address challenges that empirical researchers face: for a given dataset, who has worked with the data before, what methods and code were used, and what results were produced? Moreover, which datasets are typically used together? How can we measure the links from curated datasets to research publications and on through to their impact on public policy?     (2B7)

In terms of ontology, one challenge has been to mix controlled vocabularies for data catalog, citations, and subject headings in ways that allow for effective machine learning applications based on the graph. Another ongoing challenge has been to harmonize metadata coming from a wide variety of sources.     (2B8)

Innovations in the project include: team process to collaborate on curating the graph through GitHub, while training NYU grad students who contribute to the project; an open source library for federated queries across several discovery service APIs, to add metadata into the graph; a public machine learning competition which engages AI research teams worldwide to infer missing metadata (e.g., links between datasets and publications); a platform for using human-in-the-loop feedback from domain experts to add or correct metadata in the graph.     (2B9)

Workflow for managing the Rich Context knowledge graph is open source and an early proof-of-concept UI is online at     (2B10)

Conference Call Information     (2C)

Attendees     (2D)

Discussion     (2E)

[12:11] Ravi Sharma: Paco - do you use NIEM for apps that use 30 agencies datasets?     (2E1)

[12:12] Mark Underwood @knowlengr: @Gary Is NIST part of Coleridge?     (2E2)

[12:17] Ravi Sharma: Paco- are metadata of datasets used as primary parameters for sub-setting search by constructing Knowledge Graphs?     (2E3)

[12:21] Ravi Sharma: Paco - how do you construct KGs or do Agencies have both datasets with metadata and also their own KGs. Do you provide interconnections among them i.e. in linking KGs?     (2E4)

[12:23] Gary: @Mark Not that I know. Ram may know more.     (2E5)

[12:24] Ravi Sharma: Paco - what is the result of meta-learning using autoML?     (2E6)

[12:27] Ravi Sharma: Paco - what is the relationship of Metadata with Knowledge Graphs?     (2E7)

[12:30] John Sowa: The idea of "Manual data entry and curation of data" sounds like a throwback to punched card data.     (2E8)

[12:30] Ravi Sharma: Paco - Is slide 9 a KG using Rich context?     (2E9)

[12:30] John Sowa: What kind of tools are they using? If any?     (2E10)

[12:31] janet singer: Helpful perspective that we may be 50 years into work with KGs, but that is arguably still early days and there is still a lot to learn     (2E11)

[12:33] David Eddy: Am I correct to believe that use of "ML" means the analysis is looking at DATA & NOT at the information (software) systems that produce the data?     (2E12)

[12:34] Leia Dickerson: What specific aspects of ontology/KG work that you rely on librarians for?     (2E13)

[12:36] Ravi Sharma: Paco - How do you track use of Datasets or use of Metadata Queries to publications and then impact on policies? correlation and accuracies?     (2E14)

[12:36] Mike Bennett: @Leia Dickerson a lot of people underestimate the importance of library science in creating ontologies, approaching it as a technology effort only. Think water versus plumbing. Library science is all about things like classification theory, semantics (the meanings of things) and so on. Needed for well formed ontologies.     (2E15)

[12:37] Leia Dickerson: @Mike Bennett: Glad to see your comment, and Paco's note in his slide, as a librarian myself.     (2E16)

[12:37] David Eddy: @MikeB... from my side of the table, so called library science is not at all present in the world of software     (2E17)

[12:38] David Eddy: @MikeB... neither system folks nor library folks seem to be aware of the existence of the other. Much less what each profession could bring to the other.     (2E18)

[12:38] Mike Bennett: @David I know, that's the problem. I just presented a 2 day training course in semantics and only mentioned software in terms of how you would implement semantics. A lot of people think you can turn something into OWL and it is magically meaningful.     (2E19)

[12:39] ToddSchneider: How is semantic consistency maintained or enforced?     (2E20)

[12:40] Leia Dickerson: This is amazing. Great use case (the first one).     (2E21)

[12:40] David Eddy: @ToddS... hard work. Remember... we're just beginning to talk about the significance of DevOps     (2E22)

[12:40] Mike Bennett: @Leia Dickerson thanks - I was also delighted that he mentioned this. @David indeed; we keep seeing 'ontology' developers learning the hard way while unaware of literally 3K years of work on this.     (2E23)

[12:40] Leia Dickerson: Highly relevant to my work.     (2E24)

[12:40] Ravi Sharma: Paco and Leia Dickerson- Do the librarians use any particular domain vocabularies for creating Ontologies?     (2E25)

[12:40] Mark Underwood @knowlengr: @All FYI I am on the IEEE P2675 DevOps standards group if you want a deeper dive on that     (2E26)

[12:41] Mark Underwood @knowlengr: Paco - Is this approach used for clinical trial data?     (2E27)

[12:41] David Eddy: @RaviS... librarians (again to best of my experience) expect THEY are the controllers of metadata. May work in a library, but not around software libraries.     (2E28)

[12:41] Ravi Sharma: Mark - I have same Q +1     (2E29)

[12:42] David Eddy: @Mark... what scares me about DevOps is that its acolytes seem to think it's a new discovery.     (2E30)

[12:42] janet singer: @David @Mike: Arguably the biggest reason so little progress has been made since the 1970s (as John points out) is that efforts have been distracted by one silver bullet fad after another including the shift to ontologies     (2E31)

[12:42] Mike Bennett: @Janet +1     (2E32)

[12:43] ToddSchneider: Janet,     (2E33)

[12:43] David Eddy: @All... to say nothing of the minor discretion that finance expenses systems.     (2E34)

[12:43] ToddSchneider: Janet, 'progress' in what?     (2E35)

[12:43] David Eddy: "distraction"     (2E36)

[12:43] Leia Dickerson: @Ravi--Unsure offhand, as I'm not a cataloger by training. I know there are MODS, PREMIS, and some other standards, but not sure if they raise to the level of ontology.     (2E37)

[12:45] Leia Dickerson: @David-- yes we definitely are all about metadata. There are several standards in our field, but only a few would truly be ontological I think.     (2E38)

[12:46] janet singer: @Todd: Hard to describe the field without using fad language, but going back to terminology from the 1970s is helpful to minimize marketing hype. Progress in library science - supported technology?     (2E39)

[12:46] David Eddy: @Leia - but library metadata & software systems metadata are distant cousins... unknown to each other     (2E40)

[12:47] ToddSchneider: Janet, which 'field'?     (2E41)

[12:47] Mark Underwood @knowlengr: @Janet is "fad language" a domain-independent dialect?     (2E42)

[12:49] Mark Underwood @knowlengr: @David RE DevOps. well, the speed is at least novel. look at the updates to the Google / Apple app store and imagine that velocity a couple decades ago     (2E43)

[12:49] Mark Underwood @knowlengr: @David It's at least novel in tech industry unicorn settings     (2E44)

[12:49] ToddSchneider: Paco, what is on the ceiling in the room you're transmitting from?     (2E45)

[12:50] John Sowa: When you talk about any field, such as library science, there is a huge gap between research and practice.     (2E46)

[12:50] BobbinTeegarden: How do you create a 'context' and what is inside and outside of it?     (2E47)

[12:51] janet singer: @Todd: What language would you use to describe the area of work being discussed here?     (2E48)

[12:51] John Sowa: There is some advanced AI done by people who earned PhDs in AI and went into lib. sci.     (2E49)

[12:51] ToddSchneider: Janet, English.     (2E50)

[12:52] David Eddy: @JohnS... duly note... 200 years between Isaac Newton's alchemy (as bleeding edge of scientific knowledge) & Mendeleev's Periodic Table.     (2E51)

[12:52] John Sowa: But the about of publication in all fields makes it extremely difficult for people to discover what has been done.     (2E52)

[12:53] janet singer: @Todd: OK, what is the field of the work being discussed here (in English)?     (2E53)

[12:53] David Eddy: @Mark... personally I wouldn't use either Google or Apple as examples. Just adding new content is trivial. Understanding how the "new" thing are connected is the hard part.     (2E54)

[12:53] Ravi Sharma: Paco- how do you relate the paper to metadata and vocabulary?     (2E55)

[12:55] David Eddy: @JohnS... total agree... the "publish or perish" model is antithetical what we need to accomplish here.     (2E56)

[12:55] ToddSchneider: Janet, my interpretation is effective use of data.     (2E57)

[12:56] David Eddy: @JohnS... from observing CSAIL... VERY, VERY hard for academics to get their hands on anything other than teeny tiny systems.     (2E58)

[12:57] Ravi Sharma: David and Mark - can that be not figured by the usage frequency associated with a context and topic?     (2E59)

[12:57] David Eddy: @Ravi... where would you get usage stats from systems inside an organization, behind multiple firewalls     (2E60)

[12:58] Mike Bennett: So it's not a padded cell then.     (2E61)

[12:59] Leia Dickerson: Dropping off. Great discussion. Thank you everyone.     (2E62)

[12:59] Gary: Padded cells are normal for evil mad scientists.     (2E63)

[12:59] Mike Bennett: Hence my relief.     (2E64)

[13:00] Ravi Sharma: Paco- About Data governance, is it more stringent than MITA used in Healthcare?     (2E65)

[13:00] janet singer: @Todd: That is a great framing. So the reason there has been so little progress in the effective use of data since the 1970s is that the problems of interoperability are difficult, and efforts keep getting distracted by new silver bullet fads that are expected to change everything     (2E66)

[13:01] David Eddy: @Janet... when "change" is really yet another layer of poorly documented complexity, layered on top of previous decades of poorly documented complexity.     (2E67)

[13:01] janet singer: @David Yes!     (2E68)

[13:02] David Eddy: "inferring links" sounds really sketchy.     (2E69)

[13:02] John Sowa: For more advanced work, I suggest .     (2E70)

[13:02] Mike Bennett: @Janet I think part of that is the urge to find technology solutions for what are not technology problems (like how something means something).     (2E71)

[13:03] David Eddy: I'm just beginning to see some studies saying solid AI work is 80%+ people & 20% tech     (2E72)

[13:04] janet singer: @Mike: Right, maybe I hope in silver bullets is hope in technology solutions     (2E73)

[13:04] David Eddy: For automating the metadata collection & maintenance process...     (2E74)

[13:04] David Eddy: A John Zachman implementation     (2E75)

[13:05] Mike Bennett: @Janet exactly.     (2E76)

[13:05] John Sowa: David, please look at slide 47 ff about the Legacy re-engineering problem.     (2E77)

[13:06] David Eddy: @JohnS... will do.     (2E78)

[13:06] John Sowa: That was implemented in November 2000.     (2E79)

[13:07] John Sowa: That is far and away more advanced than anything you'll find in the knowledge graph land.     (2E80)

[13:07] Mark Underwood @knowlengr: Great talk, a lot to learn here     (2E81)

[13:07] David Eddy: (I'm only the scribe".. mine was early 1980s.     (2E82)

[13:07] David Eddy: slide 48     (2E83)

[13:08] John Sowa: Unfortunately, most of the best applications were funded by people who don't want their data and techniques publicized.     (2E84)

[13:08] ToddSchneider: Consider bits from the Information Artifact Ontology and/or bits from the Common Core Ontologies.     (2E85)

[13:09] David Eddy: @JohnS... correct     (2E86)

[13:12] David Eddy: Inside the firewall... "Gee... I can't find what I need, so I'll just make the Nth copy of what already exists."     (2E87)

[13:13] Ravi Sharma: Paco - connection between metadata and ontology?     (2E88)

[13:13] David Eddy: Excellent librarianship book... "An Emergent Theory of Digital Library Metadata: Enrich then Filter" Getaneh Alemu & Brett Stevens.     (2E89)

[13:13] David Eddy: NOTHING about software systems     (2E90)

[13:13] BobbinTeegarden: Where does librarian S. Raganathan stand in all this (graph shaped library organization)...?     (2E91)

[13:14] Gary: BTW, next week's speaker is Sargur Srihari speaking on "Probabilistic Knowledge Graphs " see     (2E92)

[13:14] Mike Bennett: @Bobbin Ranganathan -the grand-daddy of library science / classification theory.     (2E93)

[13:14] David Eddy: TotD - "terminology of the Day"     (2E94)

[13:18] John Sowa: TFO.     (2E95)

[13:18] Ravi Sharma: Mike and Bobbin - colon classification of Ranganathan was based on properties / attributes associated with the subject or topic of book?     (2E97)

[13:19] John Sowa: Todd, IAO and BFO are primitive kludges.     (2E98)

[13:19] John Sowa: I would not recommend them for any serious applications.     (2E99)

[13:19] Ravi Sharma: I do not know why Dewey or ISBN systems overtook or why were they more suitable?     (2E100)

[13:20] janet singer: We need a Grand Challenge or Manhattan Project on the boring thorny issues in effective and efficient data use     (2E101)

[13:20] David Eddy: @JanetS... correct observation... I've just finished "Tuxedo Park"... creation of MIT's RadLab ... no grand challenge today.     (2E102)

[13:21] Ravi Sharma: thanks Paco     (2E103)

[13:22] ToddSchneider: Industrial Ontology Foundry -     (2E104)

[13:24] Ravi Sharma: Paco you can kindly respond to Queries in this chat on Ontolog Forum or to individual email addresses.     (2E105)

[13:24] ToddSchneider: Meeting ends @13:22 EST     (2E106)

Resources     (2F)

Previous Meetings     (2G)

Next Meetings     (2H)