The promise – and pitfalls – of medical AI headed our way
A patient is lying on the operating table as the surgical team reaches an impasse. They can’t find the intestinal rupture. A surgeon asks aloud: “Check whether we missed a view of any intestinal section in the visual feed of the last 15 minutes.” An artificial intelligence medical assistant gets to work reviewing the patient’s past scans and highlighting video streams of the procedure in real time. It alerts the team when they’ve skipped a step in the procedure and reads out relevant medical literature when surgeons encounter a rare anatomical phenomenon.
Doctors across all disciplines, with assistance from artificial intelligence, may soon have the ability to quickly consult a patient’s entire medical file against the backdrop of all medical health care data and every published piece of medical literature online. This potential versatility in the doctor’s office is only now possible due to the latest generation of AI models.
“We see a paradigm shift coming in the field of medical AI,” said Jure Leskovec, professor of computer science at Stanford Engineering. “Previously, medical AI models could only address very small, narrow pieces of the health care puzzle. Now we are entering a new era, where it’s much more about larger pieces of the puzzle in this high stakes field.”
Stanford researchers and their collaborators describe generalist medical artificial intelligence, or GMAI, as a new class of medical AI models that are knowledgeable, flexible, and reusable across many medical applications and data types. Their perspective on this advance is published in the April 12 issue of Nature.
Leskovec and his collaborators chronicle how GMAI will interpret varying combinations of data from imaging, electronic health records, lab results, genomics, and medical text well beyond the abilities of concurrent models like ChatGPT. These GMAI models will provide spoken explanations, offer recommendations, draw sketches, and annotate images.
“A lot of inefficiencies and errors that happen in medicine today occur because of the hyper-specialization of human doctors and the slow and spotty flow of information,” said co-first author Michael Moor, an MD and now postdoctoral scholar at Stanford Engineering. “The potential impact of generalist medical AI models could be profound because they wouldn’t be just an expert in their own narrow area, but would have more abilities across specialties.”
Medicine without borders
Of the more than 500 AI models for clinical medicine approved by the FDA, most only perform one or two narrow tasks, such as scanning a chest X-ray for signs of pneumonia. But recent advances in foundation model research promise to solve more diverse and challenging tasks. “The exciting and the groundbreaking part is that generalist medical AI models will be able to ingest different types of medical information – for example, imaging studies, lab results, and genomics data – to then perform tasks that we instruct them to do on the fly,” said Leskovec.
“We expect to see a significant change in the way medical AI will operate,” continued Moor. “Next, we will have devices that, rather than doing just a single task, can do maybe a thousand tasks, some of which were not even anticipated during model development.”
The authors, which also include Oishi Banerjee and Pranav Rajpurkar from Harvard University, Harlan Krumholz from Yale, Zahra Shakeri Hossein Abad from University of Toronto, and Eric Topol at the Scripps Research Translational Institute, outline how GMAI could tackle a variety of applications from chatbots with patients, to note-taking, all the way to bedside decision support for doctors.
In the radiology department, the authors propose, models could draft radiology reports that visually point out abnormalities, while taking the patient’s history into account. Radiologists could improve their understanding of cases by chatting with GMAI models: “Can you highlight any new multiple sclerosis lesions that were not present in the previous image?”
In their paper, the scientists describe additional requirements and capabilities that are needed to develop GMAI into a trustworthy technology. They point out that the model needs to consume all of the personal medical data, as well as historical medical knowledge, and refer to it only when interacting with authorized users. It then needs to be able to hold a conversation with a patient, much like a triage nurse, or doctor to collect new evidence and data or suggest various treatment plans.
Concerns for future development
In their research paper, the co-authors address the implications of a model capable of 1,000 medical assignments with the potential to learn even more. “We think the biggest problem for generalist models in medicine is verification. How do we know that the model is correct – and not just making things up?” Leskovec said.
They point to the flaws already being caught in the ChatGPT language model. Likewise, an AI-generated image of the pope wearing a designer puffy coat is funny. “But if there’s a high-stake scenario and the AI system decides about life and death, verification becomes really important,” said Moor.
The authors continue that safeguarding privacy is also a necessity. “This is a huge problem because with models like ChatGPT and GPT-4, the online community has already identified ways to jailbreak the current safeguards in place,” Moor said.
“Deciphering between the data and social biases also poses a grand challenge for GMAI,” Leskovec added. GMAI models need the ability to focus on signals that are causal for a given disease and ignore spurious signals that only tend to correlate with the outcome. Assuming that model size is only going to get bigger, Moor points to early research that shows larger models tend to exhibit more social biases than smaller models. “It is the responsibility of the owners and developers of such models and vendors, especially if they’re deploying them in hospitals, to really make sure that those biases are identified and addressed early on,” said Moor.
“The current technology is very promising, but there’s still a lot missing,” Leskovec agreed. “The question is: can we identify current missing pieces, like verification of facts, understanding of biases, and explainability/justification of answers so that we give an agenda for the community on how to make progress to fully realize the profound potential of GMAI?”
Rajpurkar, co-senior author of the paper, is a former computer science PhD student at Stanford School of Engineering, and Banerjee, co-first author, is a former master’s student in computer science at Stanford School of Engineering. Leskovec is also a member of Stanford Bio-X, a member of the Wu Tsai Neurosciences Institute, and a faculty affiliate in the Institute for Human-Centered Artificial Intelligence.
This research was funded by the National Institutes of Health, the Defense Advanced Research Projects Agency, GSK, Wu Tsai Neurosciences Institute, the Army Research Office, the National Science Foundation, the Stanford Data Science Initiative, Amazon, Docomo, Hitachi, Intel, JPMorgan Chase, Juniper Networks, KDDI, NEC, and Toshiba. In the past three years, Krumholz received expenses and/or personal fees from UnitedHealth, Element Science, Eyedentifeye, and F-Prime; is a co-founder of Refactor Health and HugoHealth; and is associated with contracts, through Yale New Haven Hospital, from the Centers for Medicare & Medicaid Services and through Yale University from the Food and Drug Administration, Johnson & Johnson, Google and Pfizer. The other authors declare no competing interests.