Dr. Daniel A. Hashimoto - Computer Vision and Artificial Intelligence in Surgery
Expert / Speaker
Daniel A. Hashimoto
Cardiology
7
Views
0
Likes
0
Shares
0
Comments
Timestops
8:41
Introduction to the topic
The speaker introduces the topic of AI in surgery
16:05
Importance of training data
The importance of collecting and using diverse training data for AI models
24:08
Use of generative adversarial networks
Using GANs to generate fake surgeries for AI model training
32:11
Challenges in open surgery annotation
The difficulties in annotating open surgery data due to visual occlusion and glare
46:40
Current limitations of AI in surgery
The current limitations of AI models in surgery, including lack of diversity in training data
1:01:10
Potential applications of AI in surgery education
The potential for AI to improve surgical education and training
Topic overview
Daniel A. Hashimoto, MD - Computer Vision and Artificial Intelligence in Surgery
Surgical Grand Rounds (September 11, 2019)
Intended audience: Healthcare professionals and clinicians.
Categories
Disease/Condition
Anatomy/Organ System
Procedure/Intervention
Care Context
Clinical Task
Keywords
Keywords will be added soon through AI processing
Hashtags
Hashtags will be added soon through AI processing
Transcript
Speaker: Daniel A. Hashimoto
Good morning, everybody. Welcome to Sir Drika and Ron. I'm Simala, one of the surgery fellows. And today I have the pleasure of introducing Daniel Hashimoto, who comes to us from MGH, who did a surgery residency and is actually still working on his surgery residency. He took some time off and developed a tremendous passion for artificial intelligence and surgical technology. And currently is actually the Associate Director of Research and the surgical artificial intelligence and innovation lab at MGH. He has a particular interest in surgical education, surgical simulation, surgical robotics, and surgical technology. And he's highly accomplished in each of these arenas. Today he will enlighten us on the use of computer vision and artificial intelligence and our day-to-day lives in the operating room. Thank you so much for spending your hour with us this morning. We are very interested in what you have to say. Thank you very much for the kind invitation and for having me today. To speak to you this morning about computer vision and artificial intelligence and some of the work that we've been privileged to be able to get started at MGH and at MIT just across town. I do have some disclosures. None of the work that I'll be talking about has been sponsored by any corporate entities. And the work that I'll be presenting is all academic work. So I just want to give you some scaffolding around the discussion that I plan to have this morning with you guys in terms of artificial intelligence. And AI is a term that you've likely heard thrown around quite a bit in the last couple of years. So I always like to sort of level set with everybody and sort of explain the way that I like to use the terminology. Different camps in different schools of thought within computer science structure these differently. And so I do want to cover just sort of some of the basic taxonomy so that you sort of understand through the rest of the talk. What I mean when I say certain terms like machine learning, neural network, steep learning and how they relate to one another. And then going to transition to talk a little bit more specifically about computer vision, which is the type of artificial intelligence research that we do over at MGH and how it applies to surgery. And because I don't want to sit here and be the height man for artificial intelligence, I will cover the limitations and the ethical concerns that come with working in AI. And I sort of want to close with talking about how surgeon data could potentially help move AI forward in our field. So very simply and quite broadly, artificial intelligence really refers to the study of algorithms. And it's the study of algorithms that give machines the ability to reason and to perform cognitive functions. That's not to say that AI is can you get a machine to think like a human? It's can you get a machine to sort of make certain determinations and decisions based on the data that it sees something beyond just a simple if I push space bar, generate a space bar on the screen. And while most people think about artificial intelligence as being a field in computer science or maybe mathematics or statistics, the roots of the field actually go back into philosophies, psychology, neurobiology and linguistics. These numbers are now outdated, but just to sort of give you a sense of where the investments have been. Investment has grown exponentially in artificial intelligence over the last five to seven years. And two years ago in healthcare alone, there was $1.2 billion in VC money. And so that does it include the big players like Google, Facebook, etc. And then $34 billion overall investments in AI in the United States alone in 2016. So while artificial intelligence is probably really more recently come into play in the modern lexicon in the last 10 years or so, maybe 15 years. The actual field formally was organized in 1950s at a camp hosted by MIT. I believe it was up in Maine. But even though we've been working on MIT for the better part of the AI for the better part of 60 years, it's really just now started to have increased interest. And one of the sort of hypotheses about why is that there was has recently been sort of a big bang that has occurred in artificial intelligence. So first we have access to more data now than we've ever had in our lifetimes. If we just think about healthcare data alone, we think about access that we have now to different medical claims, pharmacy claims, all the different insurer plans. We think about the different patient registries that are being developed where there's sort of surgery specific like NISQIIP or the STS database, whether wider sort of genomic studies or international studies like the all of us initiative at the NIH. It's a lot of data and most of this data is now slowly becoming more structured so that we can use it. We also have more computing power than we've ever had. I mean, if you think about the types of computers that I'm working on, my laptop right now is significantly faster than the desktop I might have had 10 years ago. And this phone probably has more computing power than a desktop had five years ago. And so we're seeing this increase in computing power as the computers get better and better. And then finally we've had really an increase that has come along with this increase in computing power has been development of more powerful and more efficient algorithmic techniques that have allowed us to tackle more complicated and sophisticated issues in computer science. So this is sort of the taxonomy and the level setting that I want to cover with you. And while we won't have time this morning to go into specifics about each one of these aspects, I do just sort of want to give you one way of thinking about how all these fields relate to one another. So artificial intelligence you can think of as this bucket field that talks about sort of how these machines reason and think. And within that you have specific tools and perhaps the most popular one right now is machine learning, which will get into in a second. And within machine learning you have neural networks, which are specific technique that use machine learning and then deep learning, which is basically just a more specific type of machine learning and neural networks. Okay, so machine learning while AI may be the study of sort of machines in general machine learning focuses specifically on the algorithms and the statistical models that allow those machines to learn. And machine learning really sort of is defined as a split between classical machine learning, which is what was done starting in the 1990s and deep learning, which is really grown in bog in the last 10 years or so classical machine learning involves taking data identifying the variables and then having a human, whether it's an engineer or researcher, more specifically define which variables are of interest. So I'll use the sort of the classic example, which is you want to identify pictures of cats on the internet as an engineer you might say, well, what do I think about when I think about cats and how do I identify cats? Well, I think cats have pointy ears. So I'll tell this network to look for pointy ears and cats have whiskers. So I think this network should also look for whiskers. And while my cat growing up had a had stripes, so I'm going to teach this network to identify stripes. And so you start feeding the algorithms specific types of features that you think might be relevant for to search for. And then it can look for those features and then make a determination based on previously labeled data about whether your picture shows you a cat or not a cat. And so in a somewhat more surgical example, you might think about labeling a set of data is through supervised learning. So that is a human is in the loop and labels the images so that the machine can help learn. It's almost like teaching a toddler. You take a bunch of pictures of the gallbladder, for example, and say, dear computer, this is a gallbladder, this is a gallbladder, this is a gallbladder, and so on and so on. The computer sees enough of these examples and starts to pick up on what are the mathematical features within this image that are consistent with identifying a gallbladder. So that when you give it a new set of data that's unlabeled, the machine can look at that and decide, well, actually this top one that looks like a gallbladder to me, this bottom one, it may never have seen what that is. That's part of a sleeve gastrectomy. It can't tell you to sleeve gastrectomy, but in the training set that it's been given, it can tell you, well, that's not a gallbladder. Building off of that a little bit further than neural networks. So neural networks are really computational networks that are designed to mimic, at least at some abstract level, how neurons function. So it's designed in a series of layers where each layer within a network contains multiple neurons or computational units and those neurons interconnect with one another in such a way that they can fire in a series or in parallel to allow you to create some sort of mathematical function that then leads to an output or prediction. So what does that mean more specifically? You can ignore the math here, but the general concept here is that you can get a set of variables that enter the neural network and each of those variables is assigned a weight. The sum of those weights will then potentially trigger some threshold that has been previously set and allow that neuron to fire. And the firing of that neuron contributes the overall network prediction of whether you're going to accomplish a given task. And as more and more data goes through the network, these connections are either strengthened or weakened depending on the mathematical function that was built in place. So for a more concrete example, let's say that you are designing a neural network to identify a Ferrari. And I put a red Ferrari because if I were not sure this picture, I say everybody think about a Ferrari in your head. Most people say, well, a Ferrari is red. And so that's sort of the biological basis of how a neural network works. And so if I have designed, this is a very simplistic notation here, but if I have designed a Ferrari, a network to identify a Ferrari, a Ferrari, I might say, well, actually what are some of the features that are most relevant to identifying this car, well, this car is very well known for the scissors doors that go up. In my data set may have been a couple of pictures of Hondas. And so it might say, well, I should probably learn what this is because this is something that I see in my data set. Oh, the Ferrari logo logo is there and the color red seems to be important. And then it gets assigned weights based on how each of those features are likely to contribute to the prediction of, well, is this image a Ferrari or is this not a Ferrari? And so if the network sees this particular image, it'll sum up the relevant weights and say, well, actually, I exceed my threshold of greater than or equal to 10. Therefore, this must be a Ferrari versus getting another picture like this and saying, well, I don't see any scissors doors. I certainly don't see the Ferrari logo, but this car is red. Oh, wait, but I also see the Honda logo. This is not a Ferrari, quite obviously. And so imagine a network like this, but then expand it beyond just two simple layers here into 64, 74, 144 layers. And that allows you to accomplish much more complex tasks. And that's what deep learning is really meant to do in deep learning. You don't need a human engineer to sit there and say, I want you to look for the color red. I want you to look for this Ferrari logo. All you do is you give it a set of data and how to learn the features on its own. So in the prior example, I may just give it a data set of a thousand pictures of cars and let's say 800 of them are for our ease and 200 of them are upon us. And I say, look at the pictures of the Ferrari's and tell me which ones are for our ease. And I want you to do the mathematical equations on your own to figure out what are the relevant features. It may choose to look for the color red. It may not. It may choose to actually look for the angle of the headlights. Things that you may not readily think about as being the most obvious sources of signal to determine what that car is. And this is why you may have heard of some deep learning algorithms called a black box algorithm. You don't get to pick what goes in and it can be a little difficult to figure out why the network triggered a certain result. But the power of this is that it allows it to figure out what are the variables that are most relevant mathematically to it rather than having to worry about taking something that's mathematically relevant and interpretable by a human being. And so it allows it to accomplish very complex tasks like deep fakes, which have been in the news recently. Deep fakes are a technology that I've been developed using deep learning and it allows actors like the gentleman at the bottom here to actually super impose themselves on pre existing video of real people. And what it ends up doing is actually mapping out the actors face to the targets face and then allowing them to mimic what that real person may have been saying. And I don't know if you saw the news recently, but for the first time they had the first deep fake enabled hack of a bank account where they were able to mimic somebody's voice using the voice clearance system on a telephone and they transferred something like $15 million out of a Swiss bank account using deep learning. And so this type of technology is incredibly powerful and allows you to accomplish really complex tasks that would otherwise take a human programmer, you know, 50 60 70 years to figure out all the things that each thing would be labeled manually. And it allows you to do it at scale. And so this type of deep learning what it's actually incorporating is computer vision. And while we've covered the big schema of what AI is computer vision is sort of part AI and part other fields like signal processing pattern recognition and image processing. And what computer vision really refers to is not so much about what can I find a whisker or can I find a point a year. It takes all that information thinking about what the pixels are doing, thinking about what the specific objects are. And it tries to understand the context in which these objects or events are occurring. And so in self driving cars, what you get here is for example, the Tesla autopilot system or the old test autopilot system where the what the computer sees are you see the red boxes around this middle car here that's an obstacle they won't want to drive into the open green pixels are the areas in which you could potentially drive. And you see pedestrians on the side also labeled in red as potential obstacles. So it's now assigning some sort of values to the pixels that it's seeing the image that the better understanding of what's possible as it tries to navigate down the road. So if computer vision has gotten powerful enough to be able to somewhat semi-autonomously drive vehicles. The question that we had asked ourselves is what are the other applications for computer vision particularly in surgery. But the inspiration for applying computer vision to surgery actually didn't come from cars at least in our group. It actually came from sports. And so I don't know how many of you have read money ball or seen the movie with Brad Pitt very briefly it's about the early 2000s opened a's. And how they were able to use statistics to try to find statistical events, sub threshold statistical events that better describe how players might perform in the context of a run toward the playoffs. And they went beyond the traditional ways of thinking about data in baseball where the easily countable events home runs and strikeouts those the people that you pay multi millions of dollars for. And if you don't have a budget to pay for home runs and strikeouts you start to think about well what are the other things that occur during the course of a game that allow you to pick up extra wins along the season. And what's been fascinating to see happen in sports is going from traditional manual methods like this to much more sophisticated levels of data that can use computer vision to specifically track every motion that's happening on the field. So you have real time positional data that can help you determine for example how you need to shift players around in the field how quickly these players can move and it allows teams to make adjustments in game and and building their rosters. And so the thought was if you can get this type of data using computer vision in cars and in sports. Well why can't we money ball surgery and can we use this type of technology in clinical practice. The problem is that we traditionally have not stored this type of data to be available for analysis. What's the typical way that we store data while we have claims codes right so they tell you what has happened but they don't necessarily tell you how they don't necessarily tell you why you just have a series of codes for billing purposes. If you look at the operative reports the operative reports are notoriously inaccurate or they're incomplete. At least in adults think about how many times you may have seen an operative report for a close to sector me while the critical view of safety was achieved and it was clipped in standard fashion. Okay. The goal that is to come to section the goal that are fossa and there were no complications and yet two days later there's a violent injury. Right. You wouldn't get enough information from the operative reports for girl why and studies have been done in the Netherlands that have compared the operative video to the dictated reports found that 27% of atrojanc injuries weren't documented in the operative report. And that the video was superior to operative report or even photos in documenting the steps for the events of an operation. But the problem is nobody has time to sit there and watch videos of every single operation that's ever happened to try to figure out the things go right or the things go poorly even an M&M right we have an hour for M&M in the morning you can't watch an entire operation to try to figure out where in this process could things have gone wrong and what where could this complication have occurred. So we took the lessons that we learned from the sports industries and the automotive industries and we first started with a small pilot study that looks really just at 10 patients 10 videos and we did the classical machine learning we use feature engineering specifically identify things like color texture relevant and atomic position. So we had to look at the instruments and seeing how the lines within a computer within the image field may converge to try to pick up on where a point of focus is and then we translated this data mathematically so that it could be interpreted by a computer. And the goal of this was to try to see well can a computer understand what step of an operation is currently happening. So we had a small amount of success using some of the classical machine learning techniques and so that gave us enough of hope I guess you could say to try to pursue a deep learning approach which is much more involved and so we expanded a data set into 88 cases that we presented the American surgical this year. So now we're closer to five or 600 cases but for the initial data set we got 88 cases of sleeve gastrectomy and they were annotated by two of our bariatric surgeons across town and they were annotated into seven specific steps and we pick sleeve gastrectomy because it's straightforward. It's a straightforward sequence of events and most of the surgeons at least at our institutions share the same sequence of steps and so they were able to break down the video into time stamps and to see okay from this point to this point portrait being plays from this point to this point. The stomach is being stapled and these labels were then fed into a neural network model and first we started with a visual model we use this specific type of deep learning called the residual neural network and that is a purely visual model it's able to take a frame or the still image from a video and try to make a determination based on visual data alone about which step of the operation it came from. And it's a lot of data because one second of video has about 24 frames per second and so you can imagine over the course of an hour operation how much data you might be building up. But we all know that an operation is more than just one single still image it's actually a sequence of events that occurs over time it's multiple still images that occur over the course of our operation and we felt that was important. You're not going to start stapling your sleeve if you haven't put your ports in yet and so that's valuable information that a network should know to help it improve its predictions and so we use the second type of neural network called long short term memory that is able to think about the temporal process in which events occur and then it's able to make a determination about which step it thinks it may be seen. And so the training process involved splitting the data into a 70% training set so the training set is the video that contains the labels of what step is happening when and that's fed into the networks that the network can learn these associations. We then take the 30% of the test data and then feed it the video only at which point then the machine has to decide based on the visual image and the temporal sequence at which it sees it which step is actually occurring. And what ultimately matters is how well does it do against human surgeon annotators so what do we find well the visual model alone how to about 82% accuracy and identifying what was happening so fairly reasonable accuracy when it just looks at the picture alone when we added a temporal components we got a slight increase in accuracy to 85.6% and you're probably looking at this and saying well 85.6% that doesn't sound very good. If I want artificial intelligence to be working in the operating room I want 100% accuracy. Well I agree with you the problem is I can't get to humans to agree more than 86% of the time and what the steps of an operation are. And so if I want a machine to be 100% I'm asking for super human performance because I can't get to attendings to decide when a step starts and when a step ends. The actual difference this sounds crazy it's not like people thought oh this is port placement no no this is stapling the stomach. It's actually about when does that transition happen when do you go from dissecting the gastrocholic ligament and when do you start dissecting closer to the hiatus. That difference is where attendings differ it may differ by two seconds by five seconds by a minute or two but when we start thinking about developing a eyes that can automatically analyze surgical video for the purposes of potentially providing guidance those seconds matter. And so the holy grail that we're trying to build toward is thinking about using real time identification of steps to try to predict potential problems that are happening. But right now our problem is that I can't even I can't I don't have anything to train the machine on. And so we're really having to work hard now one of the lessons that we learned from this is that we need some sort of standards to really define what it means to be engaged in an operative step. What does it mean to be clipping your doctor is at the time that the clip apply comes in is at the time that the clip apply goes around the duct those types of things are where surgeons have been differing in the way they annotate video. And these are problems that will prevent a eyes from becoming more and more accurate because it can only be as good as the data on which you train it on. So to sort of give you a more solid grounding on on how this actually looks. I want to show you a video and to just orient you to the video that you're going to see the right most frame here is the actual video of the operation. The middle most frame is a probability map. And so on the y-axis or steps one through seven of that sleep gastrectomy the red is actually high confidence. So if you see red it's looking at it's like well I with high probability eyes the machine think that you are in step one. Whereas blue is very low probability and this green yellow is sort of somewhere in between. And this map is moving sort of from right to left it's scrolling across the screen and it's actively analyzing this right most bar here. So we don't have any user experience user design people in our lives. So this is obviously not something that is actionable in the operating room but just sort of a way to conceptualize and visualize the data. So in the first case it's a very straightforward sleep gastrectomy. And so as the case is progressing you see the probability map just sort of progress from step one to step two to step three. And very straightforward step wise matter it doesn't look like there's a lot of variability. It doesn't look like there's a lot of events that would seem to be confusing the machine. In case two however you have you can already see a significant number of adhesions in distortion of the anatomy. And as the machine is analyzing this you start to see some noise in the signal. And as it's processing it starts to sort of have some confusion about what's happening until you start to restore some semblance of anatomy that the machine can recognize like the stomach here. And then the prediction starts to stabilize. And when you compare the two cases with a straightforward case on the left in case a. And then it's slightly more complex case in case B you can immediately see a visual difference in the probability of events occurring. So in case B when you look at elements in here here here where you see a lot of overlap those are areas of the machine is detecting is deviating from a normal expected operative course. And we sort of extended our analysis and we're doing preliminary analysis now on these types of events and we're sort of starting to pick up on the potential here to identify areas where complications may have occurred. Whether it's an enteroidomy or it's bleeding those are the events that are attending to sort of fire with the machine is not fitting within the probability distribution of events that it was expecting. So based off of this we're sort of started to think about what are the potential implications of this type of technology for surgeons. The first probably the simplest is the users not real time in the operating room, but use it after the fact to think about using this to rapidly and automatically index and bookmark video for different operative steps. Whether you want to use it to present that a conference later on or whether you want to automatically index videos that you may want trainees to view and learn from later on. And we think this would help with case preparation with coaching and with feedback and at mgh we have certainly offered our services to the residents when they want to present a case at M&M or if they want to review one of their videos that we have available at least for a handful of cases the ability to do this for them. And we also think that this sort of sets up the foundation for events versus detection that if you detect events like bleeding that you could identify those and either notify people post operative leaves so think about having a sticker on the chart that visually you see the case a or the case B and knowing that case B may be a setup for potential complication and may warrant sort of additional monitoring that may deviate from a typical post operative pathway. Other applications that we've been exploring have been telemetering so thinking about when you are deviating often expected course and you are in a rural area my family from Peru if you're down in Peru and you want to link to assistance here Boston Children's or mgh for a breakup to automate that process because sometimes people may not recognize that they're getting into trouble. Or if you have trainees that have started a case for your faculty member and you want automated notification that you should probably come down from your office or down from clinic to join the case because the training is starting to deviate from an expected operative path and could probably use your assistance earlier than they're willing to call you for. And then finally from a bit of mortality review as I mentioned. Now we're not the only group working on this type of technology and I certainly want to introduce some of the other groups around the world that are working on this. So the ERCAD group in Strasbourg France has also a very big artificial intelligence and surgery group and they have been working on trying to help with operating room logistics so using deep learning to identify what step of an operation is occurring to specifically try to estimate how much operative time is remaining as you're progressing through the case. And they're trying to use this to try to better rearrange their operating rooms. So if they find that a patient is starting or that a surgery is starting to get into some complexity and potentially some deviation from an expected operative path, the prediction of the estimated time remaining can get updated in real time and allow the their or logistics folks to rearrange the operating rooms accordingly. Another big group is the Teodor Grand Troll group at St. Michael's Hospital in the University of Toronto and you may have heard some of their work that they're doing. So they've actually built an actual device called the OR Black Box and it's meant to really function as the Black Box recorder and the UC on their plans for the operating room. They actually come in outfit the entire operating room with additional cameras to get not just the interoperative view but the of the patient but the whole view around the operating room itself. So they actually track the position of everybody in the room. They track who's coming, who's going. They track how people are communicating with one another is somebody yelling at somebody else is somebody not closing the loop on communication. This is the real big brother stuff that's happening in the operating room in Canada and they're using this technology to really do quality improvement initiatives. So to try to get a sense of what are the different risks that occur throughout the course of the case and I think they recently published a paper. I believe in JAMA surgery that looked at distractions in the operating room and the incidence of complication with the number of times somebody came in and out of the operating room that they were able to track automatically using some of their computer vision algorithms. And they've sort of been able to use a lot of that data to also work specifically on instrument usage. So they're able to with high degrees of accuracy identify specific instruments being used during a case and track those interoperatively with the goal of hopefully being able to try to get some quantitative measures of how people are moving around without using to install extra sensors on the instruments themselves. I'm also part of an international collaboration that's working right now on trying to do automated assessment of these steps to automatically using deep learning extract the features to assist in the recognition of anatomy based on the data. So we're working closely with Sages, the Society of American Gastro-Testor and Endoscopic Surgeons with their lab coli group. And one of the things that we're trying to build out with them is a GPS for surgery. So a ways for surgery. We have surgeons at Sages engage in educational platform. It sort of sets up different video scenarios. It presents to them an image either of a thyroid ectomy or a colicisectomy. And it asks them to note, okay, if you were to start this operation, where would you start your dissection or if you were to start this operation, where do you think there were currently a little nervous. And you can have multiple people engage in this web platform and annotate it. And what it ends up doing is it builds a heat map of where different percentages of experts may have wanted to start their dissection. And the beauty of this is that you can use these extra annotations in two ways. One, you can have junior residents go through it and do the modules themselves and compare their performance to the experts. And that's sort of an immediate way to use annotation for an educational purpose. Two, we can use this background data and train an artificial intelligence algorithm. I lost my screen here. Let's see here. Well, my laptop's on, but the screen itself is off. Is there a way we can fix the turn the projector background or the screen back on. Let me try this. Yeah, exactly. It's an input signal not sound. I know. If only we'll get a machine fix it. Here we go. I'm plugging it back in. It's gone again. It's gone again. All right, we'll try that. Let's try a different port. All right, now I'm not going to touch it. So we can use the background data on the annotations to actually train deep learning algorithms to try to predict or understand where a certain dissection should start and stop with a goal of being not to tell you where you should start your dissection, but to provide you with additional data. If you were being coached in real time by somebody else, what would they recommend if you were to take a group of experts in the room from children from M.J.H. from Hopkins, from Cleveland, whatever it may be, what would they do? It allows us to sort of look at where some of these annotations are coming from and what the differences are in style and technique. But as I said, I'm not here just to hype up AI. I just say, AI is the greatest thing that's ever happened. It's a silver bullet. It's going to fix all of our problems. And then we can all go home and drink coffee and the robots will operate for us. That's not my intent. That's not my goal. I really want to highlight that AI, while it's certainly huge right now in the way that we talk about it, it's still very much in its infancy, particularly in medicine and even more so in surgery. So the big things, the common things that we all hear about in statistics, correlation is not causation. And almost all of deep learning right now is our correlation. This is all which variables correlate with the outcome that you want. It is not which variables cause the outcomes that you want. There's a lot of work being done on trying to build out causal algorithms, so algorithms that can actually do that analysis and figure out what's causing said output. But we're not at the stage yet that it's actually being implemented. It's a very theoretical stage right now. As I mentioned, a lot of these algorithms, particularly within neural networks and deep learning, have interpretability problems that black box problem where you can't look inside and see why is it that this variable correlates to this output. It's not like a linear regression where you can look at all the different variables and see how much each variable contributes to the final prediction. You can't really do that. It's very dependent on large data sets and for at least this type of computer vision problem, we don't have those large data sets. The biggest data sets that exist number in the hundreds of cases and we probably need close to tens of thousands, hundreds of thousands of cases. Supervised learning needs a lot, a lot of good annotation. And so as I mentioned before, with those two surgeons that disagree, they may have disagreed on 14% of the cases or so. But the fact that they were able to invest the time to annotate all those videos says something, but it also says something about the cost. I can't imagine that the hospital will be thrilled if all of you took two days out of the week just to sit there and annotate a video for an AI to learn. That's just not a good investment in time. And so we need to come up with creative ways to get people to annotate as part of either their daily jobs or as part of an educational exercise, like I mentioned before. And then perhaps most importantly from a societal perspective is that there's a lot of systemic biases in the data that we use, whether they're explicit or implicit. You think about some of the early work that's been done in cardiology, thinking about the biases that are implicit biases that were in place in the way people decided who gets a PCI or who gets put on certain types of medications. That type of data, that type of bias is baked into the data that we've all been analyzing for decades. And so now to come along and say, well, I have all this data that I've built up for 40, 50 years, I have millions of patients. I should just use that to train an AI. I think we really need to be, we really need to understand that these data may not give us the exact answer that we're looking for. We have to recognize what the limitations of our data sets are. We really need to make sure that we're choosing data sets that are appropriately representative of the patient population that we want to analyze and to ensure that we're making the right determinations from these complex analyses. One other interesting thing that I want to bring up, another analogy to cars. Last year, the MIT Media Lab did, but I think it's a really interesting study and it was called the moral machine. They were thinking about what are the ethical implications of building out AI and self-driving cars. Because right now, as we are driving and we are potentially going to hit somebody crossing the road, we have a decision to make. Do we try to swore about it the way or do we continue forward? And they set up a lose-lose scenario. Somebody dies in each of these scenarios, whether it's the people in the car that's being driven autonomously or the pedestrians outside of the car. And they set up 13 different scenarios with different combinations of people. There's a family in the car. There are old people crossing the street. There is a young person crossing the street. There's an old person driving the car. There's a rich person, a poor person, a pregnant woman, a cat or a dog. And the sets up these scenarios where some living being has to die and you have to decide who dies in each scenario. And the underlying hypothesis was, well, we will converge on an underlying set, one underlying set of ethical principles that will govern self-driving cars. And then the car industry can use this to determine how a self-driving car should react if it was put in this situation. And they distributed this around the world in 2.3 million people answered this survey. And what they found was that it depends on where you live, how you value certain life. And so they basically broke it down into 3 cultures, a western culture and eastern culture and a southern culture. And depending on what type of culture you grew up in, you chose differently. So in the western culture, the preference was for what's called inaction. So if you're in a self-driving car, in a self-driving car, it's potentially going to hit a pedestrian, or it's going to swirve and crash and kill you. And the preference was for it to not do anything or continue its course of action and just hit the pedestrian. The thought being, well, the car didn't engage in action, then end up killing somebody, that person must have run out in front of me. If you were in a southern culture, they had a preference for sparing, higher status individuals, people that were labeled as rich on the diagram. And if you were living in an eastern culture, you were more likely to spread the pedestrians that were crossing the street around the people in the car. And so this has now created a problem for the automotive countries. Because now they need to create at minimum three sets of AI to govern the principles under which these self-driving cars are going to function. And I think there are very real implications for surgery as well. I mean, we can't even agree on a way to do an operation between MDH and Brigham, much less thinking about like across the country or across the world. What is the right situation in which you should offer a certain operation? Where is the technique that you want to use? Is the same technique, the same actual outcome in every surgeon's hands or certain techniques better in certain surgeon's hands versus others? It depends how you train, it depends on what the values are, it depends what the patients values are. And these are all sort of the big ethical questions that are going to need to be addressed as we stretch to develop these AI algorithms. But despite these limitations and despite the huge concerns that we're still working through, I think the idea is that the promise is worth the pursuit. The idea that you can take the population level data that we have right now and combine it with the individual patient-specific data and really do this sort of personalized medicine approach to surgery, not just in terms of thinking about the traditional ways of thinking about, oh, we're doing genomics studies or targeted therapies. But you really think about it even from an interoperative perspective. Actually this time around I'm going to do an anticolic anestimosis versus retrocolic anestimosis based on this specific patient's profile and my specific skills as a surgeon to try to get very specific types of advice and data that you can then make action on. The problem is that we'll never get there if we don't actually start sort of putting all the data together. And so that's why we started to try to build out a centralized repository of data. The strength of this kind of database really comes in combining multiple institutions, you know, each institution, while we're very high volume here in Boston, we're still not high volume enough to train an AI, even if you take decades worth of data. And so the key really is collaborating and working together. The idea is building out a database that can have access to not hundreds, not thousands, but millions of operations, which are orders of magnitude more than we would ever seen our lifetimes as an individual surgeon to be able to use algorithms to really understand at a wide quantitative level. What are some of the differences that are occurring. And so the idea is to sort of put together this concept of calling a collective surgical consciousness. This idea that you can use this computer to identify and track this data in real time and then to pull from the database to provide very specific data to you as a surgeon. When you're in this situation, you have an 80% chance of a complication. Here are five alternative things that you could potentially do and here are the predictive probabilities of complication related to that. And then you can make the decision on what you want to do. And while being in a large academic institution and in a big city allows us access to a lot of expertise, that's not the case for everybody, even in the United States, much less the world. And so the question is, how do you distribute the type of expertise that's in a place like here and send it out to South America and send it out to Africa, send it out to rural parts of India. And what's the most efficient way to do it? Is it efficient to send Dr. Quenka to South America every year for six months at a time. Or is it better if we can sort of in view some sort of machine to then say, well, let me show you visually how Dr. Quenka would do this. Does that mean that the surgeon on the other end can execute not necessarily, but the goal here is to get a baseline level of surgical skill distributed and then think about how we can then distribute decision making and knowledge in an equitable way. And so sort of to close things up, the near future of AI and able surgery is an automation. Just like in medicine, it's not automation. Pathology, for example, did a huge study that was published in Nature a couple of years ago where they trained an AI algorithm to identify metastases in lymph nodes from the axilla for breast cancer. And what they found was that they took a group of pathologists, I think it was 100 pathologists and their error rate was actually seven and a half percent. Their AI's error rate was 3 percent. And so off of that, you would think, well, we should have AI looking at all these slides. But then their third arm of the study was to take the pathologists and have the AI screen the slides first and narrow down the areas where the probability of metastases was the highest and then have the pathologist look at that. And the combined error rate was 0.5 percent. So the idea here is not to automate us away, but really to augment the clinical care that we're using to really have the machine and the human work together with the ideal of really getting the best possible care that we can for our patients through technology enabled means. The work that I do at MGH is part of a huge team and then we've been growing quite rapidly. Osmeralis is the clinical director over there as a very after surgeon, Daniela Rus is ahead of AI and computer science at MIT, who's been instrumental in getting this started. And Guy Rossman, who's our director of engineering has been working very closely with our postdocs and our fellows and was trying to build out new types of algorithms that we can use to accomplish these events. And I want to give credit to the international collaborators and what they call the rhombusoid group. They've been working on some of this data collection for annotating specific planes and trying to use that to identify where you should start an operation. And ultimately sort of the call to action that I would put forth to surgeons is to really think about if you don't already to what are the different protocols that you might want to implement to be able to record cases with patient consent, of course. To potentially use this for future studies similar to the way that we bio bank things should we be core core vision banking videos and surgical data from the operating room so that we have the data available as a technology starts to catch up. You can certainly engage in helping to annotate cases certainly our group would love that help as well as with other people. And we're always of course willing to engage with others on brainstorming for ideas and projects. I think most importantly, this field in particular really requires collaboration with our data scientists and engineers. This is not something that as surgeons we can sort of pick up and hit the neural network button on state or SAS and get an output and expect that we can contribute something meaningful. This is really where the type of information the way the field moves is so crucial to have that the expertise is also focusing in every day that we really want to promote this collaboration. So my contact information in our labs contact information is here and we're certainly happy to engage with anybody and everybody on it. At least having an initial conversation on how we could help any ideas that you may have. So thank you very much. Well, that was just terrific. You did a beautiful job of taking a sexy topic and sort of educating us on the basics of the different types of AI and then saying what what's possible now what requires collective consciousness. I'm interested in your and all the pictures of those people were young. Yes, partly that's because the technology has been developed recently, but the optimism is that this stuff takes a while. So I'm interested in you sort of speculating when you have no hair. And so when I drove in this morning on the mass pike, I'll admit I was doing email. Okay, my car is very good at driving down the mass pike down a straight line and even if two cars in front of me, you know, they put in a break, my car knows to slow down even before I see those break. So I get that. Okay. And even on side streets, it does a pretty good job until there's some construction and there's a guy standing there with his hand and saying, wait, or even holding one of the stop signs and saying you go and you go and take turns. So I understand that you can collect just like you can collect from millions of cases. The company who built my car is collecting every time I don't do what the computer tells me to do and it's learning and it's collected billions of miles of experience. But just like in driving in surgery, right, near perfection is required. So I have trouble believing in my old brain that my car is ever going to recognize the guy holding the sign or holding his hand up. Now, my son, who's younger than you, drank the Kool-Aid and said, Dad, anything that your eyes and your brain can process. A computer can learn to do. Yeah, drink so much that he's now a design engineer for that company. But I still don't believe that my car is ever really going to let me not sit in the seat. And maybe it's kind of old. So for image-based questions like the pathologist or the real just it's real. It's happening within the next five years. I'm sure we will see pathologist radiologist being augmented or in part replaced dermatologist. Right, take a picture right that's already happening live and you know you get 15 minutes you can get back from from the website with which your rashes or what your mark is on your skin replacing the humans. If you go to the end of your career, okay. Where are we going to be? Yeah, so I certainly don't think it's going to exactly your point. I don't think it's going to be at the level where we're just going to sit back and drink coffee while this robot operates automatically. But I do think that we will be at the stage where you will have probabilistic modeling and real time throughout the case. I do think that we're going to get to the point. It's certainly by the end of my career where it will at least be able to give you some probabilities of the events that are occurring and what may occur. Now how accurate that is is variable depending on the type of edge scenario. And so what I mean by that, but how rare are the events that you're encountering. And so one of the things that I neglected to mention is that it's not just the machine saying, oh, you should do this next or here are the options that you could potentially engage in as you continue down operation. But what's the confidence that the machine assigns with some predictions? Because you want to know if a machine is saying, I think actually that this is the cystic duct and this is the cystic artery. It's not good enough for me for a machine just to say, this is ducks and artery. I wonder how confident are you machine that this is doctor artery because if it tells me I'm 50% confident that this is the duct, I'm going to probably ignore it. But if it says I'm 99.9% sure and I'm also sure it's really a confirmation. It's like being able to go to a second person say, hey, do you think this is a common bother? Do you think this is the cystic duct? So I think we will be at the point where at least for the basic cases that we will see this anatomic identification and we will see some suggestion of what steps maybe necessary or some suggestion of whether a complication is likely to occur based on some probability curve. I don't think I agree with you. I don't think we're going to be at a stage for a complex operation like is often done here that you would see, oh, hey, this computer is going to tell us how do you really complex like pediatric case. The example of the car just funny, the first test of death that happened when you review that data, it was a person in autopilot that was driving through the desert and crashed into a semi-trailer. And when you look at the picture, no human would have ever made that mistake. The semi-trailer was one of those gray sort of metallic semi-trailers that was reflective. And the environment was such that it looked like the desert was reflecting off of the semi-trailer. And so what the car perceived was it was driving through open road because it's all blue from the sky reflecting on the semi-trailer and brown from the sand and it perceived that as open road. And so it never slowed down and crashed at 80 miles an hour into that semi-trailer. That's a mistake no human would ever make. No human ever look at that scenario and say, I can drive right through that truck. And so there are very interesting cases that's much simpler than a guy holding a stop and go sign that there are cases where we just don't think about these scenarios as being sort of, oh, I would have never expected a machine to have trouble with that. And that's why these simulated miles are so important. And that's why I think collecting cases even when they're not being used for AI is important because you've got to know what all the possibilities are to try to model and account for them. In that regard, you can see utility for this information that's invaluable right now, which is to say, if you've got these series of circumstances that you documented and filmed and whatnot where things have gone wrong to be able to show not only our trainees, but quite frankly, more experienced surgeons as well as this is where trouble, this is what trouble looks like. And that gives you a wealth of experience that sometimes you won't gain for decades and you could think, boy, I could see this for a relatively short period of time. Oh, absolutely. I think the first applications of this are going to be in the educational arena. It's not going to be that this is going to be in the operating room five years from now helping you operate. It's going to be five years from now, this type of technology, being an option for residency programs, fellowship programs to use to try to make training more efficient by providing sort of concrete examples. Bradins are talking about it at a conference and reading a paper and talking about complications to actually be able to review the database. Oh, this is the complication that I was telling you about. I've seen once in my career, but was captured through the process of trying to build out these algorithms. I think this is awesome. I can totally have your vision of, you know, autonomous safety based on digital inputs and I get that. I have a social question because what I see, maybe my visions off is all the routine cases, gallbladders, gastrectomy, skinny transplants, all this stuff that's nonsense routine cases that we all learn in our first two years of surgery are all going to be done with digital guidance. And we're going to have a class of surgeons instead of being digital assistants are going to be surgical assistants. They're going to watch this thing and they're going to be able to correct it. But we're going to have a second class of surgeons who can do the really hard cases, the second redo liver transplants, you know, the complicated thoracic stuff I do. Where are they going to get trained? Because if you go up, if you raise these surgeons to be dependent on their computer to do everything and then you're going to launch them into complicated stuff, it's going to be a very tough. We have a tough time now with our residents. I don't know. As you know, a fourth year resident now is in many ways equivalent to what we were as interns in the ancient days when, you know, perfect old she was an intern case. That's no longer true. So how are we going to train the surgeons for the really tough stuff or are we going to have two training months? Yeah, that's a great question. That's a question actually a lot of people start this is a question that we struggle with all time. Dr. Lomo brings us up all time actually across across across the city. It's honestly something where I think we're really going to have to this is why the incorporation of this technology is not sort of as a do what I do color within the lines sort of scenario. It is really a the surgeon is still operating the surgeon is still making decisions. This is a coach to have you along the way. Now you're right that that could come off the rails and somebody else could decide to make it a no you have to do this. You could easily see some regulator being like if you don't do it this way, you're not going to get paid or you're going to get sued. And I think that's just something that we collectively from a societal perspective have to agree on. Sage is actually going to host a big conference on this in February in Houston that's going to bring in some of the regulators from CMS and some of the companies as well to really talk about what are the implications of building out this type of technology. So it's a good question. And great talk can I have a question about training the AI. Sure you talk about how any one location is not going to have enough cases to build that collective consciousness and you need to do sort of a worldwide collection of cases. That problem is magnified in pediatric surgery. So we could look at every thoracic topic, you know newborn Tf repair for a hundred years and not get to a million to train the AI. Is there a way to augment that training of the AI by using either experience or video annotation that can go off script or off what's actually on the video. To even further augment the training of the AI to more rapidly make it more reliable. Yeah, absolutely. And actually that's some of the work that we're doing at MGH, which is trying to think about what are the rare scenarios we're focusing sort of on common boundary which are some very, very rare. So we have decision maps and sort of different operations and where the rare scenarios that may or may not occur. You can actually build in some manual features that allow it to try to replicate. So you can build off of either photographs or images. And then you can use a technology called generative adversarial networks. So there are actually two neural networks that are paired against one another. One neural networks goal is to generate an artificial scenario. The other is to identify an artificial scenario that was generated by a neural network. And so as the two compete against one another, their performance both improves. And so the idea here is to try to use those neural networks to create fake surgeries basically. Fake surgeries that mimic the real thing enough that it can be used to train algorithms. That's actually that's one of the technologies that we use to build those deep fakes is generative adversarial networks. So we're going to create a sort of fake data or artificial data that can then be used to augment a real data set. The self driving car scenarios, all the companies, depending on the company that have different splits. Toyota, I believe, uses about 80 to 90% real data and then 10 to 20% simulated data generated by adversarial networks. Great talk. And so there's an effort here and other places using voice sounds in the room using Alexa and things which seems a little creepy that you're being monitored both like this. That might help annotate your cases. Because if someone asked the clip apply or they're probably going to clip the talk the vessel and but also to identify moments when things start to unravel just based on the tone and the tender in the room. So I thought of number one combining that with what you're doing. Yeah, helping the humans in the real realize that they're starting to. No, absolutely. So that's actually the specific type of work that Tito Gran Trab is engaged in up in Toronto. So they're doing it actually his head of AI there is a big natural language process in guy from UT. And so their work has been looking specifically out what is the interaction between the surgeon anesthesiologist surgeon circulator certain scrub. And the events that are happening in the interrupt of video feed so one of the collaborations that we have is trying to figure out how do we combine our data sets. Because the structure a little bit differently so that our data can talk to one another and that we can figure that out. We're actually getting a black box at MGA. Hopefully this year to be able to accelerate some of that research. But that's exactly a line of research that we're thinking about. Love the talk. This seems to land itself really well to laparoscopic in my ass. You already have a video for open surgery which a lot of us still do any thoughts on actually. Yeah, you tell me the open surgery. Thank you. For the data input for that. Everyone resists and we play with Google Glass. Yeah, yeah. Yeah, so that's actually a really, really hard problem tackle when we first started working on this. We tried open surgery first. And we had too many problems with visual occlusion. So when people stick their heads over the field and then the camera like it's blocked overhead. If you manage a camera on somebody's face, well, then actually you have to create a triangulation system to track where that camera is in real time to get accurate data about what you're seeing. So that's actually still an unsolved problem. We've tried sort of different IR sensors. There's too much reflection off the instruments or glare from the light. Open surgery is a really, really hard to tackle unless it's a very controlled confined type of operation. So you know, if you're doing an 80 fish show, that's reasonable. But if you're doing anything in trouble or in the chest, that's all bets are off. And we have a long way to go on sensor technology to be able to do that. Well, Daniel, I think you said a recent years record for the most known people asking questions in a grand last. And there's more of what we have to stop because of the time. I think that says something about the interest of your topic and the likely input of your topic and the way you've educated us. So congratulations and thanks so much for doing this. Thank you. Thank you for having me.
Click "Show Transcript" to view the full transcription (62294 characters)
Comments