How Is AI Used in Health Care?

Contributors: Daniel Restrepo, MD, and Raja-Elie Abdulnour, MD

Dec 15, 2023

13 minute read

A female doctor reviews medical data with her patient.

Everyone makes mistakes. Doctors, even with years of training, are no exception.

Each year, hundreds of thousands of patients in the United States are disabled or die from delayed, missed, or incorrect diagnoses. The National Academies of Sciences, Engineering, and Medicine attribute 10% of patient deaths spanning back decades to these mistakes. Mistakes impact wallets, too, as testing and retesting conditions can cost patients thousands of dollars.

Artificial intelligence (AI), technology used by computers to think like a human, could improve, or even save, the lives of countless people with its own diagnoses. Recent research from Mass General Brigham suggests a new type of AI can diagnose patients with impressive accuracy. Known as a “chatbot,” this AI uses data from the internet to provide human-like answers to questions.

“Chatbots are not going to replace doctors,” says Daniel Restrepo, MD, a Mass General Brigham hospital medicine specialist. “If anything, doctors would use these chatbots like a reference source, such as a medical textbook. They would be adjunct tools to make their diagnoses more accurate.”

Dr. Restrepo and Raja-Elie Abdulnour, MD, a Mass General Brigham pulmonary care specialist, describe how AI can help shape the future of patient care.

How do doctors make a medical diagnosis?

Doctors diagnose patients from a seemingly infinite list of conditions learned in medical school. Subtle differences distinguish one condition from another. Even the most seasoned doctor can struggle to identify the right condition from limited details.

A doctor eliminates possible conditions using a step-by-step process known as clinical reasoning. They begin by gathering a few basic clues, asking for a patient’s:

Age
Gender
Symptoms
Existing health conditions
Personal medical history
Reason for visit
Family health history

The doctor then pauses to consider their patient’s answers, or “history.” After completing their evaluation, the doctor creates a full list of possible conditions.

From there, a doctor begins to eliminate possibilities by asking targeted questions. Such questions might ask about over-the-counter medications, or how pain changes in response to certain activities. They can then order imaging or laboratory tests to confirm a hypothesis and discard more possibilities. If the results disprove their hypothesis, they start over and cross-examine the next most-likely condition.

What is an example of a medical diagnosis?

Dr. Restrepo uses a hypothetical 55-year-old male patient to illustrate a typical diagnosis. If the patient arrives at his clinic with chest pain, Dr. Restrepo wouldn’t have enough information to diagnose the patient based on those symptoms alone. The patient could have a condition as simple as a bruised rib or as complex and severe as a heart attack.

“Imagine the patient as an iceberg,” adds Dr. Abdulnour. “It’s up to us to consider the most obvious clues before making our way underneath the surface.”

After asking the patient about his medical history, the patient may mention that he smokes every day and takes medication for type 2 diabetes. This information narrows Dr. Restrepo’s search to the most likely condition: coronary artery disease. When the patient mentions that he feels pain climbing upstairs, Dr. Restrepo feels more confident in his prediction and orders the necessary tests.

Errors in medical diagnoses

A doctor interacting with a patient represents one of several steps toward a diagnosis. The process begins the moment a patient notices their symptoms. By the time a patient sees their doctor face-to-face, several other steps impact a diagnosis, including:

Determining the closest available medical center
Interactions with nurses and administrative assistances
Insurance coverage

Human error can impact each step. Health care workers tired from a long day might not record the correct information from a patient, or a language barrier might lead to miscommunications.

Doctors leave similar room for error when assessing symptoms. A few common errors include:

Environmental bias

If a doctor sees an overwhelming number of patients at their practice with one disease, they may feel tempted to test for the same disease without ruling out other conditions first. During the winter, a New England doctor might see a patient with a cough and fever and immediately test for the flu. They might not consider less common, albeit plausible, diseases, such as tuberculosis, without first asking about the patient’s travel history.

Racial bias

Doctors are more likely to misdiagnose a patient of color than a patient who is white. Medical textbooks written decades ago often only consider how a disease appears among patients who are white. Without knowing how sophisticated symptoms appear on individuals of different races, doctors can easily mistake one disease for another.

Dr. Restrepo uses the autoimmune disease lupus as an example. Lupus, he says, sometimes presents as a bright red rash on the face of a patient. Doctors can easily recognize the rash on lighter skin. But, without knowing how the rash appears on darker skin, doctors might not identify the rash as unique to lupus and confuse it with another skin condition instead.

Shortcuts

The brain takes shortcuts to answers when confronted with large amounts of information. According to Dr. Abdulnour, doctors can take shortcuts without realizing it. Their training, he says, can enable such behavior. In medical school, students memorize symptoms of thousands of diseases often without learning how to properly apply the knowledge. While such training can lead doctors to a correct diagnosis in most cases, it can also lead to mental pitfalls known as cognitive biases.

“Students can catch themselves trying to perfectly match a disease to certain symptoms, but nothing is going to be perfect,” he adds. “It’s a lot more likely a patient has a common disease presenting atypically than a rare disease presenting perfectly.”

Mistrust

Someone can ask the same question two different ways and receive two different answers. How a doctor treats their patient can produce similar results. Patients tend to trust doctors who actively listen to concerns and establish warm, friendly environments at their practice. Doctors can erode that trust by rushing through appointments and dismissing patient concerns.

A disgruntled patient might withhold personal information, a lot of which could lead doctors to the correct diagnosis.

How does AI work?

AI is modeled after the human brain. The brain uses millions of neurons to collect and process information, moving from one thought to another. Computer code acts as AI’s “neurons.” Code instructs AI how to think; it moves from data point to data point at a fraction of the time it takes a brain to move between thoughts.

“Our brains are constantly gathering data points, and we’re cross-examining that data with what we already know,” says Dr. Restrepo. “But when it comes to finding a diagnosis — that needle in a haystack of information — we can’t revisit every case learned or reread every page of medical literature, especially if there are dozens of other patients who need our attention.”

While the human brain might process four or five of the likeliest paths through the figurative haystack, the most advanced AI models, known as language-learning models (LLMs), or chatbots, can account for millions. Using billions of datapoints, humans train chatbots to simulate each pathway through the haystack. The model, in turn, can receive new information, compare it to its trained memory, and select the most probable diagnosis.

Pitting a doctor like Dr. Restrepo against a chatbot is almost like pitting Tom Brady against a Division I college-level quarterback. The chatbot might be competent, but it’s definitely not going to outperform him any time soon.

Raja-Elie Abdulnour, MD
Pulmonary Care Specialist
Mass General Brigham

Examples of AI in health care

The chatbot ChatGPT is trained with the entirety of the internet’s publicly available information. If someone types a word into the chatbot, it responds with the most likely word to follow. It uses an estimated 4 billion pages of text to provide its answer — an impossible amount of information for the human brain to comprehend all at once.

In a perfect world, a doctor might ask a chatbot to rank the likelihood of several diseases by entering their patient’s age, gender, and symptoms. Then, as they receive more information from their patient — their history and test results — they can ask the machine to narrow the list of possibilities using the new information.

Across Mass General Brigham, doctors are already testing the ability of AI, to:

Recommend tests: Mass General Brigham researchers found that ChatGPT could accurately recommend imaging services for patients with breast cancer and breast pain.
Answer patient questions: Massachusetts General Hospital researchers found that ChatGPT can credibly answer patient questions about colonoscopies.
Administer drugs: A doctor at Brigham and Women’s Hospital is developing AI that can automatically administer anesthesia to people who are pregnant before they undergo a C-section.
Predict disease: At Mass Eye and Ear, researchers used AI to predict whether an oral lesion may progress to malignancy.

The risks of AI in health care

Doctors marvel at how well chatbots communicate information. To many, the answers sound all-too human; it can feel like talking to another doctor.

However, chatbots are far from perfect. Whether the model pulls a needle or straw out of its haystack depends on the data used to train it.

“There’s an old saying that garbage in equals garbage out,” says Dr. Restrepo. “If you put in the wrong information, you’re going to get the wrong answers.”

Several crucial flaws limit the current capabilities of chatbots:

Made-up information

Plenty of inaccurate information exists online. When trained with this information, a chatbot can produce faulty hypotheses. Sometimes, models make up information or a citation from a misread paper.

More troubling: This inaccurate information, called “hallucinations,” can sound convincing and trustworthy. Models, after all, are trained to communicate information in as human of a way as possible.

Biases and stereotypes

Answers from chatbots can also reflect longstanding biases and stereotypes published online. For example, if someone asks a chatbot what job a girl wants when they grow up, the chatbot will likely list “nurse,” “stay-at-home mother,” and other stereotypical jobs for a woman. If asked what job a boy wants, the chatbot tends to further sexist stereotypes, producing answers such as “CEO” or “doctor.”

Racial stereotypes are common, too. In a new study, Brigham and Women’s Hospital researchers witnessed chatbots perpetuate and exaggerate both types of stereotypes in clinical scenarios. Changing the race or gender of the patient, they found, could significantly affect how the chatbot made a diagnosis. Race, ethnicity, and gender affected how a chatbot responded to questions about subjective patient traits, like honesty, understanding, and pain tolerance.

One case in point: A chatbot from the study was more likely to label Black male patients as abusing the drug Percocet® than Asian, Black, Hispanic, and white female patients.

"It's a bit like scientists looking into the mirror, which can be quite humbling," says Dr. Abdulnour. "We're seeing models perpetuate many of the stereotypes that research biases have perpetuated for decades. We must always be prepared for biased responses and know how to address them. Also, we can learn biases in our own thinking that we never knew we had just by studying bias in AI.”

Stubborn answers

Once a chatbot produces an answer, it can resist changing it, even when presented evidence to the contrary. When diagnosing a rare condition, it might fixate on one part of a patient’s case. The stubborn behavior makes it harder for the model to reassess its prediction when provided new information about the patient.

A similar error, says Dr. Restrepo, often occurs among residents and attendings in the earliest stages of their medical career.

Could AI ever replace a doctor?

Chatbots cannot train themselves with up-to-date, accurate information. Nor can they detect and learn from their own shortcomings. They rely on doctors to function. Asking anything more of a chatbot will likely result in a misdiagnosis.

“Pitting a doctor like Dr. Restrepo against a chatbot is almost like pitting Tom Brady against a Division I college-level quarterback,” says Dr. Abdulnour. “The chatbot might be competent, but it’s definitely not going to outperform him any time soon.”

Yet, despite their limitations, chatbots can serve doctors in a variety of ways. Many, like Dr. Restrepo, liken chatbots to an extra hard drive for the brain. Just like a computer, the brain can only store so much information to its memory. Chatbots offer doctors a way to access leftover information.

Doctors can also use chatbots to:

Maintain and update records
Debate or generate hypotheses
Find publications or academic citations

When will AI be used in clinics?

Many doctors already use chatbots the same way they do a search engine: a starting point for looking up or verifying basic information. To make chatbots more trustworthy for doctors, Drs. Restrepo and Abdulnour are researching:

Question prompts they know will produce accurate answers
The full limits of chatbots, including their risks, efficiencies, and answer quality

Before relying on chatbots in a high-stakes clinical environment, both doctors want to see guardrails created for the technology. A code of conduct spearheaded by the National Academy of Medicine could outline best practices for the technology. The best practices would need to minimize the spread of misinformation and ensure the technology produces safe and accurate results. They would also need to protect equal access to the technology.

AI chatbots will evolve to produce more accurate, and more lifelike, answers. Predictions about artificial intelligence by Mass General Brigham experts have heightened expectations in the years ahead. Drs. Restrepo and Abdulnour, however, urge patients not to expect a perfect, all-knowing device. The futuristic lure of chatbots might dazzle in the news, but the imperfections inherited by the devices remain the same.

Today, those imperfections remind patients of what that a perfect diagnostic device really is: science fiction.

Learn more about the potential of AI

Contributor

Daniel Restrepo, MD

Hospital Medicine Specialist

Contributor

Raja-Elie Abdulnour, MD

Pulmonary Care Specialist

Related research about artificial intelligence

AI Screening for Heart Failure Clinical Trial Speeds Up Enrollment, Study Finds

published on Feb 17, 2025
Artificial Intelligence Drives New Approaches to Cancer Care

published on Feb 13, 2025
Using AI to Measure Prostate Cancer Lesions Could Aid Diagnosis and Treatment

published on Oct 29, 2024
Generative AI Model Study Shows No Racial or Sex Differences in Opioid Recommendations for Treating Pain

published on Sep 16, 2024
Artificial Intelligence and Digital Health in Radiology: A Guide for Innovators

published on Sep 13, 2024
Using AI for Early Detection of Lung Cancer

published on Sep 5, 2024
Using AI to Personalize Treatments for Non-melanoma Head and Neck Skin Cancers

published on Sep 5, 2024
AI Tool Offers More Accurate Detection of Immune-Related Adverse Events in Cancer Patients

published on Sep 4, 2024
Research Spotlight: Generative AI “Drift” and “Nondeterminism” Inconsistencies Are Important Considerations in Healthcare Applications

published on Aug 13, 2024