Artificial Intelligence for Skin Cancer Identification: The Downside of an Abundance of Caution

EBM Focus - Volume 19, Issue 4

Reference: Br J Dermatol. 2024 Jan 17 early online

Practice Point: This AI program correctly picks up melanoma more frequently than primary care providers, but also generates a lot more false alarms.

EBM Pearl: An AUROC value describes how well a model can make accurate binary classifications across all thresholds in a given population.

Artificial Intelligence (AI), of a sort, has been around for years in medicine, starting with machines reading EKGs. But now radiologists and pathologists and even dermatologists are starting to see how visual pattern recognition by AI may be changing their fields in the near future. The British Journal of Dermatology published a report from the co-founder of a Swedish-based company of their AI program being used to help primary care providers sort out which suspicious skin lesions were most likely to be melanomas.

The AI program in question was trained on dermoscopic images of melanomatous and non-melanomatous lesions. The PCPs in question were asked to take dermoscopy photographs of skin lesions they believed were potentially melanomas and had decided to either biopsy or send to a dermatologist for a second opinion before asking the AI program what the program’s opinion was.

For those of you who may not know, dermoscopy involves using a special high-powered back-lit magnifying glass (currently costing between $800-1400) to zoom in on suspicious skin lesions to help identify which ones should be biopsied. Most dermatologists use dermoscopes regularly, but few primary care providers (in our experience) are entirely comfortable with dermoscopy.

In this study, 253 lesions (on 223 patients) had dermoscopic AI analysis followed by either biopsy or a consulting dermatologist stating that it didn’t need to be biopsied. In all, 119 lesions were determined as benign by dermatologists and not biopsied. In the end, the PCPs felt 20% of the lesions were highly suspicious for melanoma (the other 80% were classified as suspicious enough to refer or biopsy, but not highly so), the AI program analyzed the lesions as either having evidence of melanoma or not, and 44% were determined to have dermoscopic evidence of melanoma. Of the 134 lesions biopsied, 11 were invasive melanoma and 10 were melanoma in situ. The AI program stated one melanoma in situ did not have “any evidence of melanoma” on dermoscopic appearance. The PCPs also classified this “missed” lesion as low likelihood, but still suspicious enough to either biopsy or send for consultation. Another 8 lesions that turned out to be melanoma were flagged as having evidence of melanoma by the AI program, placing them into a higher risk category compared to the PCP.

Researchers took all this information and calculated an AUROC value. AUROC is an acronym for Area Under the Receiver Operating Characteristic Curve, which is a method to evaluate how well a given test is able to correctly rank samples into a binary category. (In this case, the categories are a straightforward “likely to be melanoma based on appearance” or “not likely to be melanoma.” An AUROC value is calculated for a given population by taking into account both the precision of the results (how often is there a false positive?) and the recall of the results (how frequent are false negatives?). In general, an AUROC value of 0.5 means the test is no better than chance at discriminating between two outcomes, but a value of 1.0 means the test is 100% always able to discriminate between two outcomes in a given population. This app had an astounding AUROC value of 0.96 (95% CI 0.93-0.98) for differentiating melanomas from other skin lesions. The app was even better for detecting invasive melanomas.

That said, this study shows one potential problem with algorithm-based diagnosis. The researchers set the sensitivity of the app fairly high, but when this happened, almost by default, the specificity went low. (In population terms, this meant there were fewer false negatives but more false positives.) This means that an additional 72 biopsies would have been performed to find an additional 8 melanomas. (Recall from above that about half of these were melanoma in situ, which has a 5-50% rate of progression to malignant melanoma). This may certainly be worth it, but it’s misleading to say that this program is significantly more accurate at diagnosing melanoma. What it is, based on programming, is more cautious.

For more information, see the topic Melanoma in DynaMed.

DynaMed EBM Focus Editorial Team

This EBM Focus was written by Dan Randall, MD, MPH, FACP, Deputy Editor at DynaMed. Edited by Alan Ehrlich, MD, FAAFP, Executive Editor at DynaMed and Associate Professor in Family Medicine at the University of Massachusetts Medical School; Katharine DeGeorge, MD, MS, Senior Deputy Editor at DynaMed and Associate Professor of Family Medicine at the University of Virginia; Nicole Jensen, MD, Family Physician at WholeHealth Medical; Vincent Lemaitre, PhD, Medical Editor at DynaMed; Hannah Ekeh, MA, Senior Associate Editor at DynaMed; and Jennifer Wallace, BA, Associate Editor at DynaMed.