Artificial Intelligence (AI): the historical definition, “using computers to solve problems that would normally require human intelligence,” doesn’t quite convey how the term is commonly used in 2024. Yesterday’s healthcare tools (Coulter counters, self-interpreting EKGs, and CT and MRI reconstructions) used complicated but rule-based explicit instructions and qualify as a form of artificial intelligence. The older programs functioned with man-made algorithms and didn’t generate new content or provide many new insights beyond what their programmers explicitly instructed them to do. The term AI today is generally being used in 2024 for programs that generate an output that either answers much more nuanced questions or produces content (usually in the form of words, images, or sounds) that may not have been previously answered or created by humans.  Most commercially available AI programs today focus on one or two complicated tasks such as image/language recognition, speech/sound recognition, and/or complicated decision-making. When we talk about AI in 2024, we are usually talking about the subset of AI that is known as generative AI. 


AI (Generative): Generative AI programs are able to produce output (content) that appears new or novel. Generative AI is a subset of deep learning (produced by neural networks), which is itself a subset of machine learning. 

AI programs analyze larger datasets and locates associations and combinations between various aspects of these datasets like never before – it does this without being explicitly told which associations to look for. Then after using those big datasets to train, it is presented with new data as an input. It then generates, or predicts, an output based on the outputs it observed in the training dataset.  A common misconception is that only novel outputs are generated. In fact, generated outputs might be a simple “yes or no” response or a calculated number. A binary “yes or no” answer may seem simple, but when the question is “does this shadow on an x-ray represent lung cancer?” the decision-making power generative AI possesses quickly comes into focus. The big deal with the current iteration of AI is that this technology can predict answers to questions through algorithms and pathways that it creates on its own. AI regularly generates outputs (in words, images, and sounds) that have never been seen or imagined before. Large language models and generative networks (that differentiate between two different options such as diseased or healthy) are examples of generative AI.   

GPT-4 and Bard are just two of many text-based models that input written or spoken words, look for associations, and “generate” a response based on an immense data set that allows it to predict the likeliest answer. That answer might be spectacularly wrong (see Hallucinations, below) but when it is correct the answer does an awfully good job of predicting what a human might say – sometimes better than humans themselves.  


Bias: The skewing of outputs applied to a weighted input in such a way that it affects the output (in plain language, a fudge factor).

We are used to thinking of bias as a bad thing. Cognitive bias is well described in healthcare decision-making, and most advocates of evidence-based medicine are acquainted with the methodological and other biases that can weaken the strength of a research study. With AI however, bias is intrinsic, and it may not always be a deficiency. 

There are many entry points for bias built into AI programs, starting from deciding which problem or task a program is designed to work on, and all the way through to how outputs are delivered. In fact, between the data that is chosen to be used and the programming itself, there will always be some sort of bias. Embracing this allows us to see the bias as a feature, and not a bug.

Suppose you have a program designed to diagnose skin cancer from an image (these programs already exist). The program may be designed to err on the side of overdiagnosis (thus increasing the rate of unnecessary biopsies) or underdiagnosis (thus increasing the rate of missed skin cancer).  It is more complicated than this, but recognizing the priorities assigned by the program when assessing AI’s results, like any other clinical assessment, is a decision that is in human hands. If you think of recognizing bias as dialing up or down on a microscope lens, you can see that bias is both ubiquitous and – in some contexts – useful. 

 

Big Data: Big data refers to the massive data sets that generative AI programs are trained on before they are asked to perform work. 

Theoretically, the bigger the data set, the more accurate and powerful a given AI program is likely to be. Most people understand the acronym GIGO to stand for garbage-in-garbage-out, but the inverse phrase (great-in-great-out) can also apply. Both uses of the term GIGO apply to Big Data. When the word “Big” is connected to data, volume, not necessarily quality, is often what comes to mind. When assessing the quality of big data sets, consider using this set of criteria, conveniently all beginning with the letter V, to aid your understanding:

  • Veracity - Is the data accurate? 
  • Variety - Can the data be corroborated? 
  • Velocity - Is the data processed quickly enough? 
  • Validity - Is the data relevant to the problem at hand?
  • Vulnerability - Can the data be corrupted, or does it affect patient privacy? 
  • Visualization - Is the scale of the data correct and are meaningful differences visible to users? 
  • Volatility - How quickly is the data changing? 
  • Value - Is there a cause-and-effect relationship? 

Not all patient-related content has equal value.  Clinicians are all too familiar with chart notes that were copied-and-pasted from one patient interaction to the next, but templated notes, smart phrases, and other computer shortcuts may not always reflect real-life events. The “copy and paste” practice alone raises doubts about the veracity, validity, and value of data collected from some chart notes. Because it’s impossible to find a massive health care data set that hits all the “V” marks, use of data sets necessarily involves human decisions about what has been included and excluded. Whatever data set is used; GIGO applies. 

 

Deep Learning: The practice of combining computer “neural networks” to mimic the human brain’s ability to integrate many inputs for complex decision-making or pattern recognition. 

(Side Note: Machine learning, simply put, is the process of a computer teaching itself something. Deep learning is a type of machine learning, which could be stated as a computer teaching itself many things at the same time, which are able to build upon each other. Deep learning uses neural networks to do this. Generative AI programs discussed in this context are a subset of deep learning programs with specific functions.)

Healthcare workers enjoy an advantage over other people when it comes to understanding why computer scientists used the analogy of the central nervous system (“neural networks”) to describe how deep learning works. We learned in school that brain neurons almost never have a single input and almost always work in conjunction with many other interconnected neurons. Much as neurons in the brain might have numerous afferent and efferent dendrites extending in many directions, a single layer of a neural network might simultaneously use many different inputs (both from their own assessment of the question at hand as well as other neural layers) to create either single or multiple outputs. Deep learning relies upon multiple separate but interconnected layers of neural networks to produce outputs.  There are subtypes of deep learning with names like random forests, multilayer perceptrons (MLP) and convolutional neural networks, but the basic concept is that they all involve layers of decision making that are integrated. 

Another neurology-adjacent analogy to the decision making that deep learning is capable of performing is the assessment as to whether a given individual has a normal walking gait. A neurologist, physical therapist, or geriatrician may all be able to make a split-second judgement on an individual patient’s gait without being entirely aware of all the factors that go into that judgement. Similarly, a deep learning programs that was trained on a dataset containing 1000’s people videotaped walking that were previously judged by expert clinicians might make a split-second judgement on the video of a new person’s gait – that is, someone who wasn’t in the training dataset – without the computer programmers being entirely aware of how the computer arrived at this decision. Another we to put it is that the AI program was given thousands of questions and their answers, and then it can generate (or predict) the answer to an entirely new question. 

 

Hallucinations (Fabrications): False answers, sometimes consisting of potential answers and sometimes seemingly “made up” after AI programs make inferences. Like many of the best made-up stories, hallucinations often contain a kernel of plausibility or truth. 

Hallucinations are a major downside of generated content. In healthcare they have the potential to cause catastrophic errors. There are ethical, logistical, and pragmatic considerations that go into minimizing hallucinations, almost all of which involve some degree of human oversight and a level of certainty about what constitutes “truth,” a concept that is not always easily discerned. For instance, many lay people think of a death certificate as a final correct answer to what caused someone’s death. Without an autopsy, however, the cause of death is often an educated guess. AI makes educated guesses all the time – but when the educated guess of an AI program is demonstrably wrong it is called a hallucination in popular parlance, although because the program is generating the response, a fabrication may be a more precise term.
 


Large Language Models (LLM)/Natural Language Processing (NLP): LLMs are a subset of NLP that adds in Deep Learning capacity. 

Natural Language Processing (NLP) has been around since the 1950’s and uses algorithms to translate words (spoken or written) back and forth between code and language. It can be used for simple tasks like basic language translation or generating answers to questions using algorithms in which the inputs and outputs are well defined. NLP can be quite complex, but always operates within a predefined algorithm. Large Language Models (LLM’s) layer machine and deep learning onto NLP, giving the models the capacity to put language into context, answer questions, and generate human-like conversations. NLP and LLMs operate with the data sets (words) we give them, but instead of applying only a single algorithm to this data, LLM models apply layers and layers of algorithms and can generate unique never-before-seen content. 

 

Machine Learning: Machine learning is an advanced computing technique in which a computer is given both an input data set and outputs. (In other words, both the questions and the answers to a test.) It’s a component of AI where “the machine” (algorithms) look at big data, analyze it for patterns and associations, and try to predict outcomes based on what patterns it observes. Programs use tools like neural networks (along with other tools like random forests) to try to make their predictions. 

AI programs that ace certification tests in healthcare are a notable example of the strengths and weaknesses of this approach. If we give an AI program thousands of questions and answers to board certification tests for, let’s say, internal medicine, from the last three to five years, it could then be used to predict answers to new questions that were not on the prior tests. It would use prior associations between the questions and the words to predict an answer. For example, it might learn that whenever we ask a question about the thyroid gland, the likeliest answer will involve some sort of an endocrine or cancer problem. If a hormonal concern or cancer is one of four potential answers, picking that answer may be correct most of the time. This is an oversimplification, and the computer might miss a known condition like hemochromatosis that causes thyroid infiltration without a mass. But the program would be right most of the time and would probably beat most human test takers. 

Machine learning (and deep learning, which is an augmented subset of machine learning) is used by generative AI to take one task (like distinguishing between bronchitis and pneumonia) and make an inference to another task (like distinguishing between viral and bacterial respiratory infections). Such leaps are not always accurate! That’s where the feedback mechanisms of deep learning come in, which may be roughly divided into (a) supervised mechanisms (where examples of the desired outputs are given and the program needs to figure out how to get from input to output), (b) unsupervised mechanisms (in which the computer uses its own internal mechanisms to discern an output from a given set of inputs), and (c) reinforcement mechanisms (in which the computer comes up with an answer, but then correct and incorrect outputs are fed back into the program to provide another a sort of retrospective feedback).

We will talk more about the importance of human feedback in generative AI in our next segment.

 

Prompting: This term has at least two common meanings in AI: (1) the input given to an AI program looking for an output (for example, what is the differential diagnosis of chest pain?) or (2) human input given to an AI program during the development process, in which prompt-based development is part of beta testing an AI program after it has completed a period of self-supervised learning (defined above). 

Prompting either by providing a question for an AI program to answer or by helping fine-tune an AI program during the development phase can be characterized by quantity and quality. How detailed are the prompts given? Are the prompts given before or after the program generates an output?  The prompting phase of beta testing is a valuable tool for detecting hallucinations and other undesired outputs. 

 

Self-supervised Learning (sometimes called “unsupervised” learning): Self-supervised learning is the process in which an AI program takes unlabeled data and sorts it, classifies it (sometimes by grouping or subdividing the label), and generates its own interpretation without outside help. Because unsupervised learning does not label the data sets, it can use what humans would consider “unconventional” methods to connect the inputs to the outputs. 

  • Semi- or partly supervised data involves prompting the AI program with a few pre-specific answers (for example, “in this image, the chest x-ray demonstrates cardiomegaly”) before the machine learning happens.
  • Reinforcement learning involves prompts that are given after the deep learning process (for example, “You got 7 answers correct and 3 incorrect. Here are the answers you should have gotten to those three questions.”) 

Self-supervised learning is the first pass attempt of the computer to solve a given problem, just the beginning of creating a useful AI program.

----------------------------------------------------------------------------------------------------------------------------

This humble glossary is not meant to be all-inclusive, and it may become quickly obsolete. But we hope it will be helpful to clinicians who want to learn more about generative AI in medicine.

Stay tuned for future articles in this series. Part 2 will briefly review how modern AI programs are created and explain how different AI machines using the same data might have different answers to the same question. Part 3 will consider current uses of AI in healthcare. And our last installment will speculate about what’s next for AI in medicine.

For more information, consider reading: 

  • “How AI Works” by Ronald Kneusel (Trademark 2024, No Starch Press, San Francisco)
  • Bias in artificial intelligence algorithms and recommendations for mitigation. PLOS Digit Health 2023 Jun.
  • The Role of Artificial Intelligence in Improving Patient Outcomes and Future of Healthcare Delivery in Cardiology: A Narrative Review of the Literature. Healthcare (Basel) 2024 Feb 16.
  • Requirement of artificial intelligence technology awareness for thoracic surgeons. Cardiothorac Surg 2021.

Learn More