Actions

ConferenceCall 2024 03 27/Transcript: Difference between revisions

Ontolog Forum

(Created page with "= Transcript for the Markus J. Buehler Session = This session was held on 27 March 2024. The transcript was generated by zoom and was lightly edited by Ken Baclawski. Unfortunately, one can expect that there will be numerous flaws in the transcription. == Introduction == 12:03:31 Markus has won several awards. And he is also, been made a member of the National Academy of Engineering recently.<br/> 12:03:36 So at at a very young age Ma...")
 
No edit summary
 
(One intermediate revision by the same user not shown)
Line 210: Line 210:
12:35:57 Dynamic behavior of my molecule. I can give you design suggestions, to build this molecule.<br/>
12:35:57 Dynamic behavior of my molecule. I can give you design suggestions, to build this molecule.<br/>
12:36:04 And, we use this in the work just sort of a side, side line, in lots of applications, obviously, and, finding new places for plastics.<br/>
12:36:04 And, we use this in the work just sort of a side, side line, in lots of applications, obviously, and, finding new places for plastics.<br/>
12:36:13 Building super strong fibers, whatever we're doing with the army and DOD and others to food applications work for the USDA I'm not a culture creating kind of new proteins as food substitutes or as applications that can be used to make better food and do tissue, tissue engineering, somatic applications.<br/>
12:36:13 Building super strong fibers, whatever we're doing with the army and DoD and others to food applications work for the USDA I'm not a culture creating kind of new proteins as food substitutes or as applications that can be used to make better food and do tissue, tissue engineering, somatic applications.<br/>
12:36:32 There's lots of different ways, but which you can apply these techniques. You can always also apply this to, you more conventional engineering applications.<br/>
12:36:32 There's lots of different ways, but which you can apply these techniques. You can always also apply this to, you more conventional engineering applications.<br/>
12:36:42 So this is an example where we're applying this to deciding composite materials. And again, you tell me what kind of composite, considerate relationship you want.<br/>
12:36:42 So this is an example where we're applying this to deciding composite materials. And again, you tell me what kind of composite, considerate relationship you want.<br/>
Line 447: Line 447:
13:06:20 Yeah.<br/>
13:06:20 Yeah.<br/>
13:06:21 Okay.<br/>
13:06:21 Okay.<br/>
13:06:16 Yeah, I I think that's a great idea. So, thanks a lot. So again, this If I bring a meeting to close.<br/>
13:06:16 Yeah, I think that's a great idea. So, thanks a lot. So again, with this I bring the meeting to a close.<br/>
13:06:26 Yeah, thanks to everyone.<br/>
13:06:26 Yeah, thanks to everyone.<br/>
13:06:25 And Ken and Rabi, are we gonna actually hang around after some time? Ken, you have a minute.<br/>
13:06:32 It will be, but, I'd like to remind everyone the next week. We're having our first synthesis session.<br/>
13:06:32 It will be, but, I'd like to remind everyone the next week. We're having our first synthesis session.<br/>
13:06:40 Oh yes. Oh yeah.<br/>
13:06:40 Oh yes. Oh yeah.<br/>

Latest revision as of 13:32, 30 March 2024

Transcript for the Markus J. Buehler Session

This session was held on 27 March 2024. The transcript was generated by zoom and was lightly edited by Ken Baclawski. Unfortunately, one can expect that there will be numerous flaws in the transcription.

Introduction

12:03:31 Markus has won several awards. And he is also, been made a member of the National Academy of Engineering recently.
12:03:36 So at at a very young age Markus has achieved a phenomenal amount of recognition so without much I do I hand it over to Ravi to make a final remark and then Markus again welcome to the ontology summit.
12:03:51 We are very pleased to have you here.
12:03:52 Yes, thank you, and, thank you, Markus, for coming. The point I want to make is that each one of the prices that Markus has got besides his own distinguished career.
12:04:06 Is is a hero and a motto for all of us in physics. For example, Of course you do remember.
12:04:17 Feynman along with Schwinger and Gell-Mann but Feynman was the apex of the team.
12:04:27 Similarly, ASME Drucker Medal, similarly J.R. Rice Medal.
12:04:39 Please continue and wish you all the best. We have a little bit older generation, at least I am.
12:04:44 So I very much welcome you. I'm excited to hear, especially about your multidisciplinary work in this area.
12:04:54 1,000 papers in one discipline. Let's hear from Markus...

Markus Buehler Presentation

12:05:00 Great. Yeah. Well, thank you so much. And I mean, the thing really that I want to share goes to my students and collaborators that I had and my mentors for me as possible.
12:05:10 So let me let me share my screen and I'm yeah, I'm really excited to be here.
12:05:15 I've been interested in this topic for a long time. And so I'll go through.
12:05:20 Kind of some fundamentals in the techniques we've developed and And I don't know how many of you are familiar with AI and General AI in this field.
12:05:29 I think many of you are some of familiar, but I'll review a little bit of the work with Dan in that field as well.
12:05:33 And then the second half of the talk will be really heavily on on graph reasoning and related topics.
12:05:43 So, so I've been working on this actually for a long time. I, and some of you might know the work I've done in I can do 1011, 12, and that was sort of shortly after I came to MIT on category theory and graph.
12:05:51 Representations of knowledge. With ologs with David Spivak at the time. And so for many years I've been trying to build graph based systems.
12:06:00 I can understand and represent and maybe predict new connections and new knowledge across different fields. For many years this was very hard to do and actually we couldn't really discover new things other than using traditional scientific methods so we go to the lab take measurements and humans basically look at the data.
12:06:24 Now we have abilities to automate this process and actually make discovery through AI systems and that's been an interest that I've been extremely interested in exploring.
12:06:34 To explore to what extent we can do this and what the challenges are and what this might mean for the future of of science and engineering and applications.
12:06:43 So the right hand side is kind of an old paper from my lab, which take a look at this as well, of course, but the left one is a paper that I.
12:06:51 Actually put a first version on archive last week. I've updated it a few days ago.
12:06:57 So if you go to the archaic you can find it and that's sort of the backbone to a lot of the stuff I'll talk about today and there's also a code with this which I haven't released yet because I haven't had a time to really fix it up and clean it up.
12:07:08 But it will be will be open source and everything so you can you can run all these techniques, like you said earlier, and Ravi on your own data on your own system.
12:07:18 So the goal is to share that and so it will be open source. But to show you a little bit of what I'm doing, so since not all of what I'm interesting, so I'm interested in a lot of different systems, yes, but I am a material scientist at hard and fundamentally, and I do look at materials primarily that look like that on the left hand side.
12:07:37 So they have usually multiple structures. From the nano scale to the macro scale and The goal in materials understanding science engineering is a lot of times to figure out how do we How does the material work and how do we design new materials for new purpose?
12:07:54 And that's I'm sure something that many of you have thought about as well. That's an incredibly complex problem and one of the reasons why this is so complex is that these are really local heterarchical system.
12:08:05 So they're not really hierarchical in a sense their scale separated. In fact, they I'll show you examples later.
12:08:10 Lot of these systems have to be understood really in its entirety, the entire set of scales and features.
12:08:17 And we don't know, a priori what is important, what is not. That's something that I've been struggling with for couple of decades since I got into this field.
12:08:26 How do we do this well? And I think we have now actually figured out how to do it. And, and I'm excited about this.
12:08:33 So the structural complexity is very, very large in materials. This is a sort of a snapshot of something.
12:08:40 Yeah, the protein on the right hand side at the nano scale and then the same protein at the macro scale and you can see that there are obviously lots of different features in there and The difficulty is that you don't know.
12:08:53 One of these features are used for what? if you're interested in understanding what is the significance of any of these geometric features or other features.
12:09:02 And if you're a designer, like you, wanna be able to predict what is my optimal design for a certain purposes.
12:09:08 And even if you just take a subset of the space. If you look at a single molecule, it's already a gigantic design space
12:09:15 which is far greater than anything conventional techniques can solve. And I like to look a little bit at the the history.
12:09:26 So to you mentioned Feynman and others and that's great. I think we we got to think about or take a step back.
12:09:31 how do we model physics? And physical systems, especially these multi-level. Models of real materials and for many years we've done since the 1950s we had reasonably fast computers that could essentially solve.
12:09:47 equations that usually are preconceived. And I, and I say this, because that's usually what happens, humans.
12:09:54 Look at data and then you figure out an equation and use computers to solve the equations most of the time.
12:10:00 And that is sort of done to various degrees, but usually that's the process.
12:10:04 And it's very slow, of course. it takes a lot of time and, and I would argue that, especially if you're thinking about a multi scale modeling paradigm, which is sort of what this left hand side shows.
12:10:13 And I'm sure, folks at NIST have looked into this as well. It is somewhat limiting because you assume at some point you can separate the scales and add something that we have learned, kind of in the hard way I would say.
12:10:25 That's actually not possible in most cases. And, and you cannot separate the nano scale from a macro scale apparently.
12:10:33 I mean, once you know how to do it, you can, but you don't know usually how to do that.
12:10:36 So, so that's a challenge that we've been struggling with. So, now fast forward a little bit to the 2020s.
12:10:45 That's when we began to see that we could build a generative AI methodologies that could actually begin to learn concepts in a more meaningful way.
12:10:53 And what I mean by this is they can actually, as well, they can integrate a lot of different types of information.
12:10:59 physics, math, we know, but also, huge amounts of data, far beyond human capabilities.
12:11:06 They can, ingest, like I said, thousands of papers, they can, potentially reason over these results and they can also communicate results to us in a way that we can interpret them.
12:11:17 while having an internal representation of what they're working on or modeling. That can be, extracted if you wish, directly from the numerical representation of the model.
12:11:29 I know and of course we are on a on a path perhaps towards what we call AGI, which which which is something that would kind of give all of these abilities a huge boost.
12:11:40 And so once you have much more generally intelligent systems, a lot of the questions, that we're struggling with today might be, might be, might be overcome.
12:11:49 So this is an interesting time and and I'll explain in the presentation really how why I think this is such a special moment and why.
12:11:56 and how these systems can be applied to physics, right? And, and materials and engineering and other multicultural systems.
12:12:02 So the key is really, that we want to be able to improve the ability by which we, we use artificially intelligent systems to learn.
12:12:14 And, and that's a theme that I'll try to. Hit a little bit, throughout the presentation and I, I didn't have too much time to really polish this, so please apologies, apologies for this, but I hope to get the point across.
12:12:25 So you, want to think about How do we build AI systems? That can deal with all these different kinds of data sources and concepts, things we have discovered already as humans.
12:12:35 Things we haven't figured out yet. How can we use this to not simply do what we call curve fitting?
12:12:44 I say this in a very provocative way, a lot of sort of early days machine learning was to train a model.
12:12:49 To solve one task and a lot of times this is done by fitting statistics, representations of data and to make like predictions of some sort.
12:12:58 The but we, want to think a little bit more, a little more deeply about how we can improve on this and, and not just do, fitting data but actually actually train models to get a level of understanding of what they're working on.
12:13:13 Okay, and that is obviously work in progress, but that's what I'm really interested in exploring what are the tools to do this and what do we need to do in innovation in terms of the architectures as well.
12:13:24 What you see on the slide actually is an example of. A, Jenny, I, framework that was used to design a really complex material, which is visualized that channel right inside.
12:13:39 It's a mycelium composite materials is a living material that encompasses concepts of living systems and reorganization but also a variety of engineering materials.
12:13:47 Like proteins and clays and work with texturing our mechanisms. And it's sort of a beautiful example of how in a system like this.
12:13:57 We can come up with a truly unique combinations of In this case, material mechanisms that have not been utilized together.
12:14:05 Maybe not even on their own. They can create a whole concept of interactions with provides a, very powerful outcome.
12:14:12 And so I'll walk through in the slides, how we got to this and, and, about some of the methodologies that underlie this particular type of.
12:14:21 machine basically if you wish they can can solve problems like this. So before we do this, I wanna talk about, Yeah, and by the way, please interrupt me.
12:14:31 I really mean this. I mean, I have lots of slides at 60 or so and I don't I can go through all of them of course but if you have questions, please interrupt me, especially as we go into more of the substantive parts of the presentation.
12:14:43 Please. Interrupt me any any time. So how do we model physics, right? So I said earlier, we'll use the multiscale paradigm a lot of times.
12:14:51 And this means, for example, if you're a multiple modeler, like that's how I grew up essentially when I did my post- with Bill Goddard and others.
12:14:58 We're gonna learn how to how to, model materials by, by, creating inner time potential.
12:15:06 And we use this to solve a multi particle problem, the system mechanics problem. And the, this is powerful.
12:15:12 However, you pay a price because we only understand local interactions between the atoms and a lot of us to understand the collective behavior of the system, what do we need to do?
12:15:23 Well, we need to actually integrate over time and space. Right? That's the price we need to pay.
12:15:28 And so then we can describe, the movie on the right hand side is a, the emergence of dislocations in a, in a very large system, hundreds of millions of atoms.
12:15:37 And you can see there's some really interesting structures forming, which couldn't be predicted just by looking at.
12:15:42 The potentially you actually have to run the dynamics and do the statistical ensemble analysis to predict that at the end.
12:15:49 And most other problems like this too, like in polymers and proteins and biology, you're gonna have to do things like that.
12:15:56 And that's not tractable most of the time. So, so we've been thinking about how do we change this and And what we really want to do, I want to say, okay, so instead of, Starting my process by assuming, a nerdy potential.
12:16:11 So I've provocatively crushed it out. And Maybe, I don't know, I find, would agree with me on this, but yeah, let's say we cross it out and we say we don't want to use this potential because It kind of puts us, it's a very human centric geometrically, Euclidean approach, but Actually, maybe that's not the best
12:16:25 way of describing how this very big multi-part systems act this actually behaves, maybe we can have another way of discovering governing laws that are equally foundational.
12:16:36 But maybe my elegant to solve. And so the hypothesis basically, there isn't just a single way of predicting, this physical system modeling it actually, it might be multiple.
12:16:47 And who says that we have discovered the most efficient way of doing? And obviously we haven't because so expensive to do computationally.
12:16:53 So this is to say, okay, if we want to describe. A fundamental physical foundation.
12:17:00 We can do this through graphs. graphs are very flexible way of representing a lot of different kinds of things, it could be numbers, it could be atoms, that like, so this internet potential is sort of a special case of a graph model because it creates a graph of actions at the atomic scale and we integrate on that, we integrate on that scale on that on that graph to get
12:17:21 the dynamics of the protein of the atoms and molecules here. But we can also train models to discover graph representations that are non-trivial and actually are very complex and are beyond human abilities to to model a system and in this kind of graphic presentation we are not limited to Euclidean recitation.
12:17:42 So a point in a graph, a node isn't necessarily just an atom. It could be an atom.
12:17:46 But it could be other concept, could be a knowledge concept like in a knowledge graph, but it could also be a feature, a pattern in a system.
12:17:55 It could be a dynamical phase change. It could be an event happening. Could be a combination of different features that together form a cascade of events.
12:18:03 And so on. So this is the sort of thing that we are very Not very good at, I would say, as humans to write down and actually come up with something.
12:18:12 And, and this is where AI comes in very handy. it's a tool for us to build models that actually learn these graph representations.
12:18:18 And the other thing we get from this is that If we're dealing with graphs, I can actually describe, so I mentioned earlier, I work in category theory and that's why I like kind of the theories.
12:18:29 It's a, they abstract their general sense of how we can represent. anything almost, anything we can come up with, mathematics, music, art, literature, engineering of course, and other things.
12:18:41 So if I have a graphic presentation I can naturally describe everything from the early civilization ideas of symbols, mathematics, sophisticated, differential geometry and so on, all the way to computational methods.
12:18:55 Which really have, given us a lot of leeway in the last century, in our 1950, 60, 70s, we had computers.
12:19:03 Today we can actually combine them. And that's something we haven't been able to do, right?
12:19:08 So, so an early computational model. You could you program that model. And you create a solution to basically a differential equation most of the time or some other governing law.
12:19:19 However, you couldn't easily integrate that with other knowledge, like knowledge about, symbolism or philosophy or knowledge that's written down in the text or maybe artisan knowledge about how materials are manufactured, right?
12:19:35 So there's a lot of things that we can't model in an equation. And model as a number and those are kind of separate worlds from from these kind of early-stage computational models.
12:19:45 So what we're trying to do now is, to build models that can ingest a wealth of different modalities of information and Yeah, like shown in the slide here, kind of, schematically, that's you have an AI model that can, build graphs, model knowledge as a graph representation of information.
12:20:04 But don't do this, in a way that can ingest. Many different kinds of things like simulation, data, mental data equations, books, papers, patterns, even interviews with people that make materials, like I said, artisans, and so on.
12:20:19 Legal documents. I mean, there's many different dimensions to, a process, an industry or in a society that, deals with the making of something.
12:20:27 Or the framework in which society's work on different systems. So that all can be should should be ingested essentially in a system like this.
12:20:36 And then we model as a graph. And, and so this is what we've spent a lot of time on is trying to figure out how do we actually do this?
12:20:44 How do we build models like this? And these models, if they work, they should be able to describe, for example, the spider web, right, which is a complex biological material.
12:20:53 Based on a set of graph representations and the graphs should be sparse in a sense that they are really universal governing laws that describe not only how this Spiderweb behaves, around a certain set of training observations that I have, but ideally I want to describe truly how the system mechanistically functions and that's what graphs can do.
12:21:18 Especially if I train the model to learn, really elementary relationships in a way that is as simple as possible in other words graphs are as sparse as they can be to solve, certain level of task.
12:21:34 And this really and I have a quite a fine many of course and so Feynman talked about the importance of knowledge versus information, right?
12:21:42 So the idea that you you really want to confirm, convert your information, which are measurements or things about something, into how it's connected.
12:21:52 And this is exactly what we're training. A. Models to do is to figure out. How to take these pieces of information, how to connect them, and also to learn something about how important certain features are.
12:22:02 Certain elements are. And how they contribute to answering questions about the behavior or future behavior or inverse problems and so on and so on and and how they relate of course across different disciplines and different modalities of inquiry.
12:22:17 So this is something that we're trying to do. And how do we do this? Well, That's sort of 2 or 3 things I wanna highlight.
12:22:24 One is, We want to think if we built AI systems. We wanna build them, with something in mind, and it's really important.
12:22:33 To go beyond saying I have a model that I'm asking a question I'm getting an answer because we what we really want to get into I think is to build AI models for TM models that are able to think and the thinking process doesn't have to be a model of how humans think, but it has to represent somehow.
12:22:51 A mechanism for thinking and a mechanism for learning structure. And, graphs are a great platform, but I really want to emphasize on this point, we, we want models to not just Statistically, the far on patterns.
12:23:07 We want models to have some level of understanding. Now, how do we prove this and how do we do this?
12:23:12 It's an ongoing debate. But that is the direction I'm pushing really hard in what we do is to build models that can actually do this.
12:23:20 I'll do this better. And the question is, how do we provide a structural space for these models to actually do that and one of the ways we can do this actually is to look at biological inspiration again.
12:23:30 So we little biology and you have something and this is a busy slide but kind of bear with me on this.
12:23:36 On the left hand side you see, the construction principle by which biological systems are made and a lot of the biological systems are made from or through a mechanism we call universality diversity paradigm UDP.
12:23:51 And what is UDP? It is a framework that essentially says In biology, most biological mechanisms functions are created.
12:24:00 By using existing platforms like proteins, sacrifices, chitin, you name it, the kind of for the universal molecules DNA, if a biological organism needs a new function.
12:24:15 It wouldn't create a new type of DNA. It uses the DNA we have. The proteins we have and it leverages them structurally to create functions.
12:24:24 And in a way, what I've been thinking a lot about is in in the AI developments a lot of times we don't do this, we actually build a model and we build a model for a certain purpose and that's especially true for more traditional machine learning techniques.
12:24:41 You have one model for one purpose and You want to give the get the model with the new capabilities, you train it again.
12:24:47 Right. So with them thinking about building models, they can actually, that can be trained. Yes, and that's important, but They can also have an ability to think about themselves.
12:24:57 In a sense that they can, instead of retraining parameters or solve a new task to simply use existing structures in the selves to solve new tasks.
12:25:08 And this is something we can train models for. We can we can do this for example based on graph forming attention models which is the basis for a lot of the stuff I'll talk about.
12:25:17 By by forcing these models to what we call 2 forward passes and so in other words instead of asking the model the question and getting an answer.
12:25:24 I'm asking a question. The model understands it needs to think about its own structure. So it looks at the question.
12:25:31 It's things about the question and ask itself. how can I We configure my own architecture.
12:25:39 Not the parameters, but the way I'm built. Such that I can better answer the question.
12:25:44 And this is a learning process. So we can teach the model to think about itself. That there's a training process involved.
12:25:50 And then you make a prediction. And this is us actually these models. To essentially work on on solving new tasks without having been trained for these for these tasks.
12:26:01 And this is inspired by, of course, a lot of work that's been done. the last years on, new methodologies of of being able to do better inference, especially in the world of multimodal language models, which It is a powerful framework because these models are able to deal with.
12:26:18 Multiple modalities, they're also able of course to deal with graph representations because they are ultimately in the interior of themselves graph models.
12:26:28 And I don't want to go into more details, the mechanics of this, but most of you who work these models, of course, understand or know that, a inattention mechanism is a graph forming mechanism.
12:26:40 And if we can build on this and leverage this, we can do really great things with these models.
12:26:44 And that's what the rest of the talk will be, will be focused on this and how do we leverage this to higher levels.
12:26:51 So this kind of, way to think about models to self reflect and instead of giving an answer right away to learn how to think and he has a visualization of how this looks like so the model sort of what we're showing here is how the model thinks about itself.
12:27:07 It's on structure and how it rearrange his own structure as it answers the question. So again, it's not just answering the question, it's actually reflecting upon its own structure at every single iteration when we predict a new pixel or a new word or whatever we were predicting.
12:27:23 That's not enough though. And even though this is interesting because it gives us the ability to combine, a lot of different expertise and, and, and skills.
12:27:32 Right, so if you have, the models trained on the ability to reason. Or the ability to recall certain types of knowledge.
12:27:39 By doing these self-awareness training loops. You can combine a lot of these in innovative ways because they can be combined.
12:27:48 It's a graph forming mechanism so you can get emergent. New skills and abilities from this.
12:27:53 But you can go even further and the next sort of level of what we do in pretty much all of our work now is multi agent modeling.
12:28:03 So you also want to think about that a model, can be powerful on its own, but it really has to have some somebody to talk to if you wish and the talking doesn't mean necessarily.
12:28:12 Human understandable language, but it needs to exchange information and that information can be, can be cryptic, it can be, machine readable only or it can be readable for humans and that's actually quite flexible.
12:28:25 So a lot of times we build models, that have, multi-agent setups.
12:28:30 And so in a way of thinking about that as a multi particle system which is where again where I come from.
12:28:35 this creates a system of systems where you have now a particle as an agent. It has very nonlinear relationships with all the other agents.
12:28:46 And it's sort of for me, yeah, this looks like an interim potential, of importation mark.
12:28:50 And you got amazing things. We know that we have long learning relationships between particles. And put a bunch of particles in a in a box and we simulate how they evolve dynamically some really interesting things can happen and exactly this is happening here as well.
12:29:05 So they're going to be capabilities evolving and emerging. They were not obvious by looking just at a single part.
12:29:10 They come about because we have many interactions. So these kind of systems can solve incredibly complex tasks. Fully fully autonomously.
12:29:19 So now let me let me talk a little bit about really briefly before I go to the applications on kind of like the space wind so We're dealing with materials design.
12:29:28 It's a huge space. we're dealing with for just a small protein that has maybe a hundred amino acids.
12:29:34 We have, something like 10 to the 130 possible designs. And of course there aren't even enough atoms in the universe to make all of these.
12:29:43 And so even if you had a extremely rapid, engine to make make these materials in the lab there wouldn't be enough atoms to really make all this.
12:29:52 So this is clearly an intractable challenge. I mean, probably all of you have aware of that, but sometimes when I give a general audience talk, I show that and I want people to understand how complex these are.
12:30:01 Even for small systems. The other thing is the, hierarchical feature of the system. And that's shown here.
12:30:09 So there are many examples. One is disease, biology, but so you can have a mutation.
12:30:13 In a protein and it could be a single point mutation in other words one out of hundreds of thousands of amino acids are are different.
12:30:24 And sometimes this means absolutely nothing. So people don't get sick. You don't get sick.
12:30:27 But in some cases, if the mutation happens to be at the wrong spot or the right spot, depending on how you look at it, you're gonna get a disease like rapid aging disease, bit of bone disease, then many others.
12:30:37 So it's an outsized effect of something that's really tiny. So if you were to be a, a, a traditional engineering approach where you.
12:30:44 You would say, yeah, I have a protein. I'm gonna average out this protein. I'm gonna make it into a continuum model or particle model.
12:30:51 That doesn't work here at all because you're gonna miss features and those features are Only visible once you have a large system in other words If I look at the protein that makes that causes this rapid aging disease, I could by no way figure out This is going to be a disease of pathological mutation.
12:31:10 It only works once I understand how this protein behaves in the ensemble of an organism, a human.
12:31:17 And the same is true for factor problems, right? So, so infection problems, we have Usually small scale defects, things are on cracks and then you've got a kind of a big big system that you're interested in like an engineering structure and you have this outsized effect of something really tiny at the crack tip and the similar main, disease applications.
12:31:39 So there are a couple of things that are really interesting now. I mean, we have sort of a framework for for what we can do.
12:31:45 But we also have compute needs and that's the, the famous, Kurzweil curve.
12:31:51 I mean, basically talking about how much cheap computers available and that means we can basically add computational complexity.
12:31:59 So solve problems and the idea is that if you, if you have enough this is a good idea and how to build your model.
12:32:08 And if you have cheaper compute, something really, really interesting can happen because now you're making the system more complex.
12:32:14 But you're getting an outsized effect of your complexity, hopefully on the behavior. And so there's lots of different debate about this.
12:32:19 Is the emergent effect in Large Language Models, real, is it something?
12:32:29 But, Let's put it that way. I think there's definitely something interesting going on and it's not something
12:32:34 that's totally out of the world because if I take a multi particle system in physics I know the size effect.
12:32:42 So if I take, 10 atoms or if I take a million atoms or 1 billion and I confine them in a particular way like in two-D or one-D, we know they behave very differently.
12:32:52 So, size effects and emerging phenomena is something that I'm quite familiar with from statistical mechanics and I believe something like this is happening here as well.
12:33:01 So we have complexity. We have a framework that allows models to learn in a more intelligent way. And we can do pretty exciting things like for example, what I show you now are graph based methods that can solve really complex forward and inverse problems, but they're in the beginning what I'll show you are these are just single modality.
12:33:24 Here we have. A model they can take in the amino acid sequence as an input. And it can predict the motion of a molecule.
12:33:34 Like in this case a protein. So without actually ever seeing the protein that can predict, it understands. How to predict how this protein will move as it has a model presentation off.
12:33:44 The structural dynamics of this particular system. That's pretty mind-blowing and in fact I I think for me personally I the only way I could have done a problem like this a couple of years ago would be by running a very large scale molecular dynamic simulation.
12:34:01 Now I can do this basically on my laptop once I have trained this model. And so those are kinds of things like alpha fold predicting structures proteins and so on that are very quite powerful with limitations.
12:34:11 But I think we've got a long way to do this and we can make very accurate predictions, especially if I can.
12:34:16 Check predictions against physics. So that's something that of course is important here and happy to talk more about how we do that.
12:34:25 But, the model like this, can be trained really based on molecular mechanics upon the mechanics.
12:34:30 Quite accurately. We can do more complex things we can solve inverse problems so this is a paper we published last year on design proteins using this case a diffusion model so it's a combination of diffusion and tension model.
12:34:43 there's been a lot of reports of things one can do with this that I'm very powerful.
12:34:48 Like for example, if you, know the structure of a molecule you want. You can design for that.
12:34:52 I, I can design the, I mean, you ask the sequence or the palm of composition, or the chemistry that makes this particular design and you can basically solve an inverse problem that would have been totally intractable until very recently and this can be extended even to dynamics problems that would have been totally intractable until very recently.
12:35:11 And this can be extended even to dynamics problems. So I can now, this can be extended even to dynamics problems.
12:35:13 So I can now, this is a review of literature a little bit back to, so I can now,'s problem.
12:35:22 So I can now, this is a review of literature a little bit back to the 1990, s when we started looking at dynamics understanding how they unfold and how they fold and how they refold and there's a lot of work being done there, as well as computationally.
12:35:30 These are very expensive obviously because you're dealing with dynamics of There small things and even though they're small they're Very complex.
12:35:38 What we can do now is we can very efficiently solve the forward problem. So if you give me a design like a sequence of chemistry, I can tell you how this molecule is going to behave dynamically, but I can also.
12:35:50 through the inverse problem. That's the amazing thing. I can actually, if you tell me this is the kind of behavior I want.
12:35:57 Dynamic behavior of my molecule. I can give you design suggestions, to build this molecule.
12:36:04 And, we use this in the work just sort of a side, side line, in lots of applications, obviously, and, finding new places for plastics.
12:36:13 Building super strong fibers, whatever we're doing with the army and DoD and others to food applications work for the USDA I'm not a culture creating kind of new proteins as food substitutes or as applications that can be used to make better food and do tissue, tissue engineering, somatic applications.
12:36:32 There's lots of different ways, but which you can apply these techniques. You can always also apply this to, you more conventional engineering applications.
12:36:42 So this is an example where we're applying this to deciding composite materials. And again, you tell me what kind of composite, considerate relationship you want.
12:36:49 And I can design it for you using, solve the inverse problem. So this is sort of a, very interesting world, where I can do things that Really weren't possible.
12:37:01 And I can do fracture pounds as well. So I walk a lot of fracture.
12:37:04 Passion of mine and like we're also develop bottles to describe. Dynamics a factor as well.
12:37:10 No, we can go even further than this and this is sort of where now we're getting to the more No, graph based presentations and so on.
12:37:18 So first of all, if I have a multi modal model and here we're actually using this X law model that introduced earlier.
12:37:26 So such a model can do, many different things that can actually predict. Like you, shown, force extension behaviors or proteins or chemistries.
12:37:37 To predict numbers it can It can solve the inverse problem. But it can also reason over the results.
12:37:44 And it's a little bit small. But probably on the screen you can see it. So if I go in and I say, okay, I wanna, I have a bunch of different.
12:37:49 14 sequences, yeah, capital of the properties and so it does that and then you can say, I want you to figure out what's the best sequence for the highest stability.
12:38:01 If here's how the model responds and then I can say tell me why this is.
12:38:06 Right, so what is actually going on? Why do you think this sequence is the most stable one? And Because the model is able to access a lot of different ways of discovery its own internal representation of knowledge, it can make these connections.
12:38:20 It will tell us, yes, the strongest protein, the most stable protein is the most stable because of certain type of chemical bonding and and so on.
12:38:28 So the whole conversation sort of talks about these in a in a human AI interaction in this case which Ultimately, I can validate this by building a model of these proteins using more conventional techniques, physics-based modeling, I can like check these and I can actually see that in fact what the model is predicted actually is how this is a very interesting evidence for that these models are are actually very good in building.
12:38:51 Internal representations of the physical world. If trained in the particular way that we're training them with.
12:38:57 I can do a similar thing for polymer design. So this is another example. And I think it sort of shows a, a workflow.
12:39:04 Similar to previous one but a little bit more from a designer's discovery perspective so I'm starting off and say I have a molecule with this guy in the top here and I say, I want to use this molecule.
12:39:17 I randomly picked this from a data set that I was working with. And, pay attention to what it was.
12:39:24 So said, here's my molecule I wanna make, I wanna make a polymer out of this.
12:39:29 I want to, what can I do? And so, so, so it gives me suggestions and, what are those different strategies?
12:39:37 The most important one is, thinking about reactivity. And so then I go ahead and I say, okay, I want to focus on the activity.
12:39:47 I actually have the ability and this is how I, my model has the ability to compute and solve the inverse problem for 12 different features, quantum mechanical features and molecules and I tell them, are these are the things I can actually compute.
12:40:00 And designed for? What are the 3 ones you want me to address? And how do I change them?
12:40:07 Like, do I make them larger or smaller? So model gets the answer shown on the bottom. And then I actually implement this.
12:40:13 And so now I'm actually creating an autonomous agenda system where I basically say, here's my design objective.
12:40:19 I want to increase, so I want to increase the HOMO-LUMO Gap, dipole moment and possibility.
12:40:26 Those are 3 things I want to do and I want to decrease the band gap and I'm going to increase the dipole moment possibility for increased reactivity.
12:40:33 And then from then on, the system actually works completely autonomously. So it, generates solutions.
12:40:40 It It uses the ability to check the solution by using an agentic interaction. So these adversarial relationships, so the the agents aren't just plotting each other they actually fighting it out right so the the one agent makes a suggestion based on its general abilities.
12:40:55 The other agent checks the results and analyzes the structure and analyzes the properties. And there is an interaction going on and then ultimately the system will evolve through different iterations.
12:41:08 To come up with an optimal design, which is shown here on the right hand side. Of course, I'm getting a ranking.
12:41:14 I'm getting a whole set of different solutions. And at the end I get this molecule which I can then analyze more deeply and so I can now go back as a human I can analyze this I mean I can actually use an AI system as well, but for the sake of this particular paper, of course I wanted to do this as a human.
12:41:29 I did it also with, but I also wanted to understand it of course myself to validate predictions.
12:41:34 And you can see, what this model came up with is Well, we call the lack dam molecule, which actually, interestingly, if you go into the history, this is the basis for nylon.
12:41:46 Of course, we've known right on for a long time, but so this just the model has actually figured out that to build a polymer out of this platform you you got to create these header automatic amides which is what the slacker molecule is or has a ketone group and
12:42:02 in different versions of nylon you're gonna have different configurations the one that the model has come up here is a new one because I'm actually forcing the model to to discover a chemical structure that does not exist at all in the literature.
12:42:17 So, so I'm, I'm forcing the model to discuss something new. And it came up with something totally new.
12:42:21 And of course it can understand that and reason. And, and so in fact, there's a pretty good likelihood that you can actually make a polymer out of this.
12:42:29 Which is fascinating. And so I won't go into more detail on this particular example here, but we can extend this also to include physics.
12:42:36 So in other words, Multi-agent models can include physical engines as well. Either experiments or maybe a quantum mechanical simulation.
12:42:45 So in this case here, It's a paper we published recently or it's actually an archive but it's it's in review in their paper we we focus very heavily in protein discovery and we have a range of different agents here.
12:42:59 Some AI based, some physics space. So they have the ability to generate new data on the fly when needed.
12:43:05 And I think that's really the exciting part is you can truly integrate like I said at the outset
12:43:11 data driven modeling with physics based modeling and the many other modalities of interaction.
12:43:16 So now let's let's talk about graphs. Okay, so I know this is sort of the main thing we want to talk about is a representation.
12:43:26 So I can also, so the the model itself internally has a graph representation office in itself, the LMs, not just languages.
12:43:33 Numbers and figures and images and all these things but but let's say it's it's called now them We can sort of put that in a in a larger framework of ideological knowledge representation as well.
12:43:44 And so what we've done here is we use, as you mentioned in the introduction, so in this particular case we used a framework of AI systems to construct a knowledge graph of AI systems to construct a knowledge graph.
12:43:55 And, a framework of AI systems to construct a knowledge graph. And the beautiful thing there is that we can automate this process.
12:43:59 So unlike back in the old days, 20 years ago, or not, 10 years ago, we built all our representations using category theory manually on a piece of paper, had to know everything about the system already, took us an enormous amount of time.
12:44:12 Now I can construct these graph representations automatically over a few days. So this still takes a lot of compute, but I can actually construct.
12:44:20 Knowledge graphs, ontological knowledge graphs, systematically, through Generative AI tools.
12:44:28 And, and the way it works is that you, you have a body of knowledge, like in my case, I looked at 1,000 papers and I trained models to create representations of knowledge in the doing it in these graph structures like notes.
12:44:41 Concepts and connections between the concept. And a couple of steps involved and you can read out on this in the paper.
12:44:47 And how we actually produce graphs that are consistent. As consistent as possible in the way the notes and relationships are named.
12:44:54 And that can be done by iterating through the generative process. But it can also kind of be done using what we call deep node embeddings.
12:45:03 And this is something really important because. If you if you constructing these graphs from many different sources, you're gonna have slightly different ways by which certain concepts are referred to.
12:45:13 And so we have a step involved here that essentially unifies the nomenclature across different sources and this can be done actually very well using deep language embedding technologies methods and so this allows us to kind of build.
12:45:28 Graph models that are consistent and well connected, across many different domains. Many different sources, many different papers.
12:45:35 And, and I can be repeated obviously and made even better. So this sort of an innovative process involved here.
12:45:40 I can then use these graphs to do, quite interesting things. And, and so the first thing I can do I can help, a language model answer questions. Right?
12:45:52 So, and language models are not very good usually. If you ask a question, to a model it's kind of a single shot answer might be correct but it might be wrong and the degree to which these are correct or wrong depends on the model and the size of the model and the way of trained it but Many times, a model will, will struggle actually.
12:46:09 And so what you can do you can, you can use graph based reasoning to help answer better questions.
12:46:15 So, so instead of just asking a question, letting them answer it, you can say, here's my question.
12:46:20 And here's, by the way, here's some concepts that I can derive from the graph that relate the question I'm asking to a subgraph or multiple subgraphs.
12:46:30 And the model then includes It's all understanding of the question you're asking. The graph I wasn't taking you giving, like which nodes are related to the question you're asking.
12:46:41 And this is something like a rack-based graph based reasoning strategy. And the answer then becomes much better of course because there's a relationship of concepts that the model might or might not understand in its own internal representation.
12:46:53 So you get much better, much more nuanced, answer relationships. And you can do this of course as well in an agenda.
12:47:01 So, so in an agent setting, you can, you can kind of expand on this. Reasoning ability even further.
12:47:09 So instead of saying I have a a graph and I haven't I want an answer you can say have a graph on an answer but I want multiple AI adversarial agents to figure out the answer and then you they tell us so a human user might not see what's going on behind the scenes you're only gonna get the final answer, but to get the answer, there is a complex reasoning step involved.
12:47:29 By which multiple agents reason over the results compete and finally figure out kind of what is the best answer, what's the actual answer to the question, with all the maybe rationale behind it and so on.
12:47:40 This in a couple of ways including in predicting. Kind of new behaviors of materials. So what you see on the right hand side is in studies.
12:47:51 We have shown that we can actually use these type of reasoning engines. To come up with hypothetical material.
12:47:58 Studies like experiments. And a model can predict. Correctly, what would happen, what this experiment actually lead to.
12:48:06 So it has a model representation of behaviors. And in this particular case, we did this test in a sense that we had a, we have trained our model and build our knowledge graph.
12:48:16 And then we use the paper that was published after this is all done. So the model had no idea this paper existed and no access to this.
12:48:24 And so we had actually real experimental validation of the predicted behavior. And it does a really good job of this.
12:48:30 So this is really exciting. So these are kind of glimpses I think of what we can do in the future.
12:48:34 With these types of complex air modes. Other things we can do with grass and now we're getting into much more interesting things actually so Let's say I'm interested in mycelium, materials, which is a field of study that's pretty niche is very small, but kind of using mycelium mood structures, living materials.
12:48:53 To make materials and and so that's a small field, but it's complex in a sense it involves biology, it involves engineering and involves material processing, it involves sustainability.
12:49:04 Conservations and so on and so on. And we have worked on this in my lab in the last couple years and It's very challenging to make clever design suggestions because we have no model for these behaviors.
12:49:14 So we then thought, can we do this more elegantly and actually use GMI techniques like the ones I've shown, but incorporate really sophisticated graph reasoning abilities.
12:49:25 So this is what's the main part of that paper that I mentioned in the beginning is really about building.
12:49:29 These graph reasoning strategies. So What we can do, I can have a corpus of graph representation of a large set of literature.
12:49:37 And I might be literature in material science. And I have another smaller graphic, a, of knowledge or in a specialized field like mycelium biology.
12:49:46 And what I can then do, I can merge these graphs together. And, and I, and I'm, I'm seeing here that yeah, these graphs both talk about overlapping concepts.
12:49:56 And so when I emerge them, I get a lot more connection. So now if I do path finding algorithms, so if I'm sampling paths in this graph, so I can say I'm interested in the.
12:50:05 And sustainability. What are the connections between them? I can find the shortest path, obviously.
12:50:11 And it sort of tells the story of how sustainability in mycelium is connected, right?
12:50:17 And I can then use this path to answer a question. Like, let's say I wanted to develop a new research hypothesis, like a better question that I could ask.
12:50:25 Or maybe I want to predict the behavior of a system. This is extremely powerful because, individually we have knowledge about mycelium of knowledge about sustainability, but we haven't really figured out the connection.
12:50:39 With these graph forming algorithms, I can build these connections and I can sample them. I can then have LMs or multimodal elements.
12:50:46 Reason over those graphs and so what's shown here actually are these graph representations so I can kind of extract a subgraph.
12:50:52 About a particular path that I've sampled. I I can just take them literally the way they come or I can do additional graph processing.
12:51:01 I can understand, okay, if I have a graph like this, I can measure certain features of these graphs, like, focusing on bridging centrality nodes.
12:51:10 And I can tell, from these, which nodes are important for critical purposes, like some notes are really critical to understand the connection, right, between the 2 concepts.
12:51:17 Other notes might be really important but very poorly connected. And this gives me a lot of that way to kind of dig deeper into research questions.
12:51:28 I'd rather you have a question.
12:51:31 I think you muted. Sorry.
12:51:37 Ravi is muted. So yeah, I think we can probably wind up and then we can take get to the question.
12:51:42 Yeah, I just I just wanted to remind that you have 5 min to conclude.
12:51:47 Yes, yeah, so I'll be done. I'll quickly go through these examples and we've time, yeah, great.
12:51:55 So, we can reason on the graphs in pretty complex ways. Can also use isomorphic analysis.
12:51:59 So if If I have 2 graphs that are not connected, right, so you might say, well, that's great, but what if the 2 concepts are connected?
12:52:05 Like if I have a graph on biological materials and another knowledge graph on Beto than symphonies, they have maybe absolutely no overlapping notes because they talk about very different concepts.
12:52:15 So what I can use here of course is as a mapping, which is the kind of thing we've done with category theory back in the day.
12:52:21 And I can then, find analogies between them and also use graph extension mechanisms to sort of extend one of the graphs and ask the question, how would the other graph extend?
12:52:33 And I can transfer knowledge and insights mechanisms from one domain to another. So that is something we're working on right now.
12:52:39 I can also do join vision and graph reasoning. So if I have a graph representation, I can reason over these graphs.
12:52:44 If I have a multimodal AI model. I can actually give an image as well, like painting, and I can I can reason over the information in the painting as well as the information in the graph.
12:52:55 And I can make an answer. I can design a material. So this is what we've done in this particular case with design materials actually.
12:53:01 By reasoning over a famous painting by Kandinsky and a graph on mycelium and sustainability to ultimately come up with this particular design.
12:53:11 And because this model is multimodal, I can actually not only make a visual representation of what this looks like but also the model will tell me as a scientist engineer what the different components are in there.
12:53:26 So I can actually have a real understanding of What this really means and it is consistent. In a sense that I can understand that these predictions of what the different components and constituents are makes sense as a material scientist.
12:53:39 I can understand that. And it can go very detail, of course, in the chemistry. So one of the really amazing things about this is that when you ask a lot of the AI generally about designs, it really just interpolates around things that has already known, seen before.
12:53:53 This technology allows you to go far outside and really be innovative and creative and discover new things. And these new things are very detail.
12:54:01 That's the other dimension to it. So not only are they new, they're very detailed.
12:54:03 So I just had a couple of things here. So we talk about specific chemical functionalization. Very specific length scales, very specific mechanisms including manufacturing processes.
12:54:15 So this really sort of goes the whole way. The other thing we can do in an end here. Something that is not in the paper yet, but that's something you're working on right now.
12:54:23 Is I can build a random discovery engine. What I mean by that is I can take this graph and maybe I don't know what I want.
12:54:31 Maybe I want to explore so I can actually use the surface visiting, to to create, a very large number of research ideas or projected material behavior ideas.
12:54:42 By combining again, things that the model has learned with high fidelity like predicting chemical properties.
12:54:47 With knowledge graph which provides more contextual relationships between ideas and concepts and make predictions. And so again, you can look at one of these and I won't go in all the details here, but you can generate some extremely interesting ideas.
12:55:02 Which in a multi-system then you can evaluate. What I've done right now for now at least for the presentation actually I put one of those in there and I and I looked into what's predicted so this predicts a new type of way of combining DNA with Hetox appetite in creating a biomaterial that has, applications that use as super capacitor by applying this
12:55:25 pattern in technology to creating misopause carbon. So that's something that is a very specific idea.
12:55:31 So I looked into the literature and saw, kind of wanted to see is that's been done.
12:55:37 And so to what I could find in the last day or so when I look into this, it hasn't been done.
12:55:41 Right, so there, there's literature on DNA for there's literature on. creating super capacitors obviously from nano materials, but those 2 ideas have not been combined.
12:55:52 And so this specific idea, for example, gives me the model has learned obviously through graph reasoning to build a path.
12:55:58 Connect these ideas and say something intelligent about how this can be done. So it's not just an idea.
12:56:02 It goes into great detail on how this can be manufactured, what properties you'd have. And you can do many of these.
12:56:08 Again, you can do 10,000, thousands I did. I think something like a hundred 1,000 or so of a few days.
12:56:14 And which can then be the basis for additional evaluation. So, I mean, this is sort of little advertisement for my courses.
12:56:23 If you're interested in this, some of you might be interested, coming to MIT, I teach 2 courses at MIT.
12:56:27 In the summer. One is on campus, this one on the left in June, and the other one's life virtual, and they talk about some of the things I've presented here.
12:56:37 And it's, fun way to learn and be involved if you're interested. So happy to discuss more of you if you're interested in coming.
12:56:42 So with that, thank you very much. And hopefully we were some time for questions.

Discussion

12:56:45 And, thank you, Markus. That was a lot of material that you provided. And I'm sure there are lots of questions that we need to kind of figure out because we are maybe about 10 min or so.
12:56:56 It most. we're gonna do the moderation around here because
12:56:57 Okay. Yeah. Yes, yes, I have myself 3 4 questions, but I will hold them back and ask you the most.
12:57:09 Yeah.
12:57:11 One to begin with and then request. John, so are Mike DiBellis and others are already asking questions.
12:57:20 On the chat. So I will go through some of the chat question. My biggest question is Your technique is a ideal candidate and I think you touched on it at the last minute.
12:57:34 Is for integrating bio and material development. So the material that can coexist in a living system most effectively.
12:57:45 Can use all the things that you described to us today. So that to me is a low hanging fruit.
12:57:53 Good.
12:57:53 I hope you are. Working on that low hanging fruit. And that will be. Prosthetics integrated that medical devices and things that can benefit humanity immensely.
12:58:08 Second thing is I think you opened our eyes to. We visual dynamic language of the future. Starting with knowledge graphs and reasoning that you did.
12:58:20 And obviously changed lot of our notions about How to integrate logic and statistics in the process. Thank you so much, but I will open now and request.
12:58:36 I think John Sawa has his hand up for one time.
12:58:35 Phil Jackson if you have a quick one. And, please, John, John has like 10 questions in the.
12:58:43 Check. John, kindly go ahead.
12:58:46 Okay, one question that I have is. About the difference between DNA and most of these other kinds of materials and that is that DNA And just one tiny.
12:59:00 Difference can be an amazing huge difference in terms of the results. Whereas in most of these other materials, it's an average of a large, large number of particles.
12:59:14 And when you have an average, just a small difference between one particle and another won't make a big difference.
12:59:20 But when you get to DNA, Just one tiny change. Can just reverse everything.
12:59:27 Right. Yeah, exactly. I mean, that's exactly the point. So this is why this type of modeling of Heetorical structures is really critical because that's exactly right.
12:59:37 You can't average out. And so, so building graph representations of the way these systems behave is absolutely essential.
12:59:44 And you wanna learn what's important, what's not important, in what context. Yeah, exactly.
12:59:48 So that's the basis for all the modeling I showed today is exactly that. Yeah, I totally agree.
12:59:52 And that's that's the really big fallacy of conventional modeling a lot of times, whether it's in biology or engineering and science, other areas is that we we think we know and we ignore things because we don't see the importance of that in the next level up.
13:00:06 And we say, yeah, let's just average it out. But actually, you can do that.
13:00:13 No. But I think Fiji, who else has a question here? But I have one question, Markus, in terms of you are generating this knowledge using the LNS.
13:00:27 That's 1 of one way of generating the knowledge I guess. The knowledge graphs.
13:00:29 Yes, yes, so the knowledge gaps are created. Exactly, using, using LMS, from essentially from the sources that we like literature, for example, and we using the capabilities of those to be able to create.
13:00:44 notes and relationships and they're then assemble like there's a couple of steps involved in how we as a mechanics involved in how we do this of course and how we how we use the LM and what kind of LM and how they must be trained.
13:00:57 And how we ultimately, make the graph using embedding and so on. But essentially, yes.
13:01:05 And of course, you can add new knowledge to the graphs through special purpose AI, like, an ad that understands how to fool the protein or how to make predictions or you can of course add knowledge from physics simulation or experiment.
13:01:22 So the beautiful thing about graphs is you can always grow them. You can make them bigger, you can add new graphs, you can you can put contextual relationships between graphs through.
13:01:27 Graph Path Analysis, isomorphic mappings in a lot of different ways.
13:01:32 So there's no not a single way I would say, it's not, we don't have to get the graphs using LLMs, but There has to be a step involved in how to extract.
13:01:39 Knowledge, which is the connection between different features. And that's a big field to make it better.
13:01:45 Yeah.
13:01:44 we, I think there's a lot of room there to make this process better.
13:01:50 I have a quick.
13:01:48 And. Okay, and you have files.
13:01:51 Yeah, you have a quick question that we guys need to close.
13:01:55 Hug miles.
13:01:53 Yeah, thank you, Market. Hi, I have looked at your work because you also have been working with David Spivak for a long time.
13:02:04 My question is that once you generate the olog structures from the text Yeah. How do you going back and making sure it follows all the?
13:02:14 The rules of the conceptual frame all log provides, manually or through evaluation from different perspectives.
13:02:21 Yeah. Yeah, great, question. So, in the work we've done in this in this paper that I just put up, in beginning, we, we didn't do a manual check, but, but in smaller cases we've done that in the earlier work with the last few years we've done with a thousand papers with one paper too we can do that
13:02:44 manually but what we have done in the so we're looking at large data for this form formats and You can do it manually, but there are actually ways.
13:02:52 Yeah.
13:02:50 This is what I talked about earlier a little bit, the mechanics by which we construct these knowledge guys.
13:02:58 And so one way is, sort of you can, you can create an initial draft. This is what we do.
13:03:00 Of nodes and relationships. And then we, we provide it, set of these and then we ask the model, okay, now look at all the ones you've made from maybe a set of text, maybe the whole paper, maybe these sets of papers.
13:03:12 In a try to unify the the terminologies between them and we do that step So that already improves.
13:03:20 Yeah.
13:03:18 So for that you give a seed graph. Hello.
13:03:29 Yeah.
13:03:23 Yes, exactly. Well, so I provide the, I provide the L with an example and it's all in the paper in detail, but yeah, And that's, I say, here's a text
13:03:33 those sentence and this is how a graph will look like. And that's where you need to figure out what model you want to use.
13:03:41 And has a huge impact on that of course so but you have to find the model that does a good job with this yeah I think the more examples you give the better you can train to do this and I've also worked with some of those.
13:03:52 Exactly and then but then there's also the embedding strategies one of the things you can do Yeah, you use Ls, but it's very expensive.
13:03:59 And even though we have long contacts lengths, there are a lot of limitations. But the vetting model is you, you can add on to this.
13:04:08 So instead of just saying, here's all the text, I, I can use embedding vectors of.
13:04:12 notes the edges. And it helps.
13:04:11 Okay. I would like to know more about the details. I'll we'll contact you with.
13:04:17 And I think.
13:04:16 Yeah, yeah, absolutely. Yeah, happy to talk. I mean, it's all in the, it's all the paper, but, to me, yeah.
13:04:21 I saw the paper, but I also need more details. Yeah.
13:04:23 Have you? Yeah. Bye bye.
13:04:25 Oh, absolutely. Yeah, and I and I released the code as well. I haven't released it yet just because I have to clean it up.
13:04:32 Okay.
13:04:34 Yeah, I think.
13:04:34 It's all there. It's already on my GitHub. So, they'll be available. You can also take a look at this.
13:04:39 Yeah, but absolutely.
13:04:39 Yeah, that's great. I think we need to kind of come to a close area, Markus.
13:04:43 Thanks a lot. Thank you very much for the excellent talk. Also, I think you can, if it's possible, you can send your slides over so that we can put it on the web.
13:04:53 So. Yeah, yeah, sure.
13:04:54 Also, I think you can, if it's possible, you can send your slides over so that we can send your slides over so that we can put it on the web so I can either send it to me or Ravi and then we'll put it on the web.
13:04:58 And I think, actually this, your top kind of, must have given lots of people lots of questions to ponder.
13:05:08 Yes.
13:05:06 Come back to you later on. They can read your papers and see how this thing can be synthesized at some point.
13:05:13 So any last word, Ravi, I can before we.
13:05:15 Yeah, just saying that. We generally put abstracts from the chat to. The session page and Ken does that wonderfully.
13:05:27 So but in this particular case, I also had the idea that if we could send you the chat questions.
13:05:33 Yep.
13:05:33 Because they are very provoking and very useful. Maybe they can be useful for your work as well.
13:05:42 So.
13:05:42 Yes, yeah, I'm looking at it right now and yeah, be great if you can send in please and, and then we can continue the discussion.
13:05:49 I'm sorry I went, business the end. Hi.
13:05:51 I would like to request our organizers if we can have you come back again this year itself.
13:06:00 Yeah, no, I'd be happy to do that. I mean, maybe the next event, I can, I don't have to show a lot.
13:06:06 I can just maybe remind everybody what I showed. Maybe have some updates, but then have most of the time for discussion, cause I would love to.
13:06:07 Yes, yes, that would be wonderful. Thank you so much.
13:06:11 I. Yeah, yeah. Do you open to that? I would love to do that.
13:06:13 Yeah.
13:06:16 Absolutely.
13:06:17 Okay.
13:06:20 Yeah.
13:06:21 Okay.
13:06:16 Yeah, I think that's a great idea. So, thanks a lot. So again, with this I bring the meeting to a close.
13:06:26 Yeah, thanks to everyone.
13:06:32 It will be, but, I'd like to remind everyone the next week. We're having our first synthesis session.
13:06:40 Oh yes. Oh yeah.
13:06:41 And so bring your ideas. To try to synthesize all the enormous amount of material that has been covered in our summit so far.
13:07:01 Thank you. So.
13:06:53 And with that, I wish to adjourn the meeting. Once again, thanks Markus. Really great talk.
13:07:01 Thank you, Ken. Thank you so much. Thanks everyone.
13:07:05 Well done, thank you.
13:07:10 Thank you.