Early Release Articles


Use of GPT-4 to Diagnose Complex Clinical Cases

Alexander V. Eriksen, M.D., Sören Möller, M.Sc., Ph.D., and Jesper Ryg, M.D., Ph.D.

We assessed the performance of the newly released AI GPT-4 in diagnosing complex medical case challenges and compared the success rate to that of medical-journal readers. GPT-4 correctly diagnosed 57% of cases, outperforming 99.98% of simulated human readers generated from online answers. We highlight the potential for AI to be a powerful supportive tool for diagnosis; however, further improvements, validation, and addressing of ethical considerations are needed before clinical implementation. (No funding was obtained for this study.)

Received: July 10, 2023; Revised: September 15, 2023; Accepted: September 29, 2023; Published November 9, 2023
DOI: 10.1056/AIp2300031



High-Impact Medical Journals Reflect Negative Sentiment Toward Psychiatry

Roy H. Perlis, M.D., M.Sc., and David S. Jones, M.D., Ph.D.

Psychiatry as a medical specialty has historically been considered less scientifically grounded than other disciplines. Despite progress in understanding the biological basis of psychiatric disease and efforts to diminish stigma associated with mental illness, a negative bias against psychiatry may persist in the medical community. Large language models provide an opportunity to investigate this hypothesis at scale. Our objective was to characterize the extent to which articles published in high-impact medical journals may reflect more negative sentiment about psychiatry compared with those from other medical specialties. We analyzed Entrez/PubMed entries published between 2017 and 2022 relating to psychiatry, neurology, oncology, and cardiology in four high-impact medical journals: British Medical Journal, Journal of the American Medical Association, Lancet, and New England Journal of Medicine. We used the large language model GPT-4 to score each article’s title and abstract in terms of valence — that is, whether the abstract and title were likely to increase or decrease optimism about progress in a given medical specialty. Overall, and in each of the four journals, publications in psychiatry were significantly more likely to be negatively valenced than those of other specialties (P<0.001 for all omnibus v2 and post hoc pairwise contrasts). Negative-valence scores were found for 19.5% of publications relating to psychiatry, compared with 6.1% for cardiology, 6.1% for oncology, and 10.7% for neurology. Results were similar in analyses restricted to publications with abstracts, those reporting original research, those published before the Covid-19 pandemic, or those published in 2020 or later. In logistic regression models adjusting for journal, publication year, article type, and presence or absence of abstract, psychiatric publications were significantly more likely to be negatively valenced than all other specialties. Permuting the article specialty did not meaningfully change estimates of valence, indicating that the results were not attributable to psychiatry-specific terms. Published psychiatry articles were on average more likely than other specialty articles to reflect negative valence about the specialty. Whether this difference reflects volume and type of submission, or bias among editors or reviewers, merits further study. Regardless of the mechanism, the potential contribution of these articles to perpetuating negative attitudes toward psychiatry is also worthy of further investigation.

Received: August 3, 2023; Revised: September 29, 2023; Accepted: October 3, 2023; Published November 9, 2023
DOI: 10.1056/AIcs2300066



Development Pipeline and Geographic Representation of Trials for Artificial Intelligence/Machine Learning-Enabled Medical Devices (2010 to 2023)

Miquel Serra-Burriel, M.D., Luca Locher, B. Sc., and Kerstin N. Vokinger, M.D., J.D., Ph.D.

A high number of artificial intelligence/machine learning (AI/ML)-enabled medical devices are currently in development. To understand the development pipeline and worldwide geographic distribution of clinical trials for AI/ML-enabled medical devices that may enter the market in the upcoming years, we analyzed the trends in registration of clinical trials for AI/ML-enabled medical devices between 2010 and 2023 as well as their geographic distribution. We aggregated all registered trials initiated between January 1, 2010, and August 31, 2023, through the World Health Organization’s International Clinical Trials Registry Platform and included all clinical studies for AI/ML-enabled medical devices in our study cohort. Among the 710,800 registered clinical trials in this time period, 2669 clinical trials for AI/ML-enabled medical devices were identified and included in our study cohort. Of these, 2517 clinical trials provided information on the locations where the trial was conducted. Most of the trials were conducted for the medical specialties of radiology, general hospital, gastroenterology, and urology. Almost all were national trials; 1095 were conducted in China, followed by the United States (196), Japan (162), India (139), and Korea (118). The countries with the most enrolled patients in clinical trials per 100,000 inhabitants were mainly smaller countries in Asia and Europe. More international trials should be encouraged — including the involvement of low- and middle-income countries — to improve equality and ensure that the algorithms perform well across populations. (Funded by the Swiss National Science Foundation.)

Received: July 13, 2023; Revised: September 19, 2023; Accepted: September 29, 2023; Published November 9, 2023
DOI: 10.1056/AIpc2300038



Characterizing the Clinical Adoption of Medical AI Devices through U.S. Insurance Claims

Kevin Wu, Eric Wu, Brandon Theodorou, Weixin Liang, Christina Mack, Lucas Glass, Jimeng Sun, and James Zou

There are now over 500 medical artificial intelligence (AI) devices that are approved by the U.S. Food and Drug Administration. However, little is known about where and how often these devices are actually used after regulatory approval. In this article, we systematically quantify the adoption and usage of medical AI devices in the United States by tracking Current Procedural Terminology (CPT) codes explicitly created for medical AI.
CPT codes are widely used for documenting billing and payment for medical procedures, providing a measure of device utilization across different clinical settings. We examined a comprehensive nationwide claims database of 11 billion CPT claims between January 1, 2018, and June 1, 2023 to analyze the prevalence of medical AI devices based on submitted claims. Our results indicate that medical AI device adoption is still nascent, with most usage driven by a handful of leading devices. For example, only AI devices used for assessing coronary artery disease and for diagnosing diabetic retinopathy have accumulated more than 10,000 CPT claims. Furthermore, we found that zip codes that had a higher income level, were metropolitan, and had academic medical centers were much more likely to have medical AI usage. Our study sheds light on the current landscape of medical AI device adoption and usage in the United States, underscoring the need to further investigate barriers and incentives to promote equitable access and broader integration of AI technologies in health care.

Received: July 9, 2023; Revised: September 15, 2023; Accepted: September 29, 2023; Published November 9, 2023
DOI: 10.1056/AIoa2300030