Physicians vs. AI Chatbots: Who Is Better at Answering Patient Questions?

EBM Focus - Volume 18, Issue 19

Reference: JAMA Intern Med. 2023 Apr 28 early online

Practice Point: AI chatbots show promise in drafting high-quality and empathetic responses to patient questions, suggesting a potential future collaboration between physicians and technology for improved patient care.

EBM Pearl: When appraising cross-sectional studies, carefully scrutinize potential biases in the evaluation process, such as the subjective nature of assessments, to ensure a nuanced interpretation of the findings and their applicability in clinical practice.

In a world where technology seems to be taking over and workplace requests for “use-case scenarios” for artificial intelligence (AI) seem ubiquitous, it’s not surprising that the idea of virtual healthcare assistance has been suggested as a way to alleviate the increasing workload and burnout among healthcare professionals. A study recently published in JAMA explores the potential of an AI chatbot assistant to provide quality and empathetic responses to patient queries. The results offer an exciting glimpse into the future of healthcare.

The cross-sectional study utilized a public social media forum, specifically Reddit's r/AskDocs, to obtain 195 exchanges where verified physicians responded to publicly asked medical questions. Chatbot responses were generated using the AI chatbot assistant, ChatGPT, in December 2022. A team of licensed healthcare professionals (and study authors) then evaluated the blinded responses, comparing the quality and empathy between physician and chatbot responses. Evaluation criteria included response preference, quality of information provided, and empathy conveyed.

Results demonstrated that not only could these intelligent assistants hold their own against human physicians, the chatbot emerged as the clear winner. In a whopping 78.6% of the evaluations, the chatbot's responses were preferred over those of the physicians. The quality of information provided by the chatbot was rated significantly higher than that of physicians. Chatbot responses exhibited a prevalence of good or very good quality 3.6 times more often than physicians. Additionally, chatbot responses were rated significantly more empathetic than physician responses, with a prevalence of empathetic or very empathetic responses 9.8 times higher than physicians.

While the study showcases the potential benefits of AI chatbots, there are a few limitations to address. First, the study focused on a social media platform, so we can't assume the findings apply in a real healthcare setting. Second, the evaluation of response quality and empathy relied on the subjective judgments of healthcare professionals, which can introduce bias. Lastly, the study didn't include direct feedback from patients themselves, so we don't have their perspective on how helpful or satisfactory the AI chatbot responses were. Furthermore, the study did not investigate the long-term impact of using AI assistants on patient outcomes or clinician burnout.

Nevertheless, these findings open up exciting possibilities for integrating AI chatbots into healthcare settings. The chatbot's ability to generate quality and empathetic responses clears a path for potential use in drafting responses to patient questions. By leveraging the chatbot's initial response, physicians could review and refine the information to optimize patient outcomes. We might be witnessing a whole new era in healthcare, where humans and technology collaborate to deliver the best possible care.

P.S. This EBM Focus was actually generated by ChatGPT with very few edits by this editorial team. Scary, right?! While we think ChatGPT did a good job of summarizing the article and gave a basic EBM critique, we’d like to add a few details that ChatGPT left out, like significant conflicts of interests of the authors, several of whom have financial relationships with companies involved in data analytics. We also think the responses by the chatbot were pretty clearly from a chatbot, which certainly introduces bias in the assessment by the (potentially conflicted) authors. For example, someone asked about the risk of death if you swallow a toothpick, and it would be nonsensical for a real clinician to respond empathetically to a hypothetical question like that. What would happen in the context of real doctor-patient relationships is still unknown. Still, the potential use for AI in patient care is intriguing, if not a little terrifying.

P.P.S. We really love our jobs and are not ready for AI to take over writing the EBM Focus yet!

DynaMed EBM Focus Editorial Team

This EBM Focus was written by Katharine DeGeorge, MD, MS, Deputy Editor at DynaMed and Associate Professor of Family Medicine at the University of Virginia, and ChatGPT. Edited by Alan Ehrlich, MD, Executive Editor at DynaMed and Associate Professor in Family Medicine at the University of Massachusetts Medical School; Dan Randall, MD, Deputy Editor at DynaMed; Nicole Jensen, MD, Family Physician at WholeHealth Medical; Vincent Lemaitre, PhD, Senior Medical Writer at DynaMed; Elham Razmpoosh, PhD, Postdoctoral fellow at McMaster University; and Sarah Hill, MSc, Associate Editor at DynaMed.