When looking at the news, it seems like we stepped into an episode of the Twilight Zone where things that were not possible (or at least not easily done) in the past seem relatively trivial with modern day AI.
When used responsibly, AI can help do amazing things such as Digital Twin technologies that allow hyper-realistic simulations and modeling of real world scenarios for safer aircraft manufacturing; AI predictions for preventing maternal mortality; helping to share more information by digitizing handwritten one-of-a-kind archival materials; subsurface science (critical infrastructure underground) demonstrations with AI to assist in policy making; AI helping to discover more efficient materials for batteries in green science, and much more.
AI without responsible boundaries can lead to issues best avoided, as some news and research have shown. With proper boundaries, regulations, and tenets in place, AI can lead to faster and more efficient research and breakthroughs in innovation, as well as protecting its use from bad actors, as researchers at George Washington have cautioned.
Part of the struggle in placing and abiding by responsible AI practice is that AI as a label is ambiguous because it does not have a standardized definition. Is AI machine learning? Is a merge and deduplication effort deemed AI? Is AI a simple script? It is hard to determine what is responsible in AI when there are so many ways to define AI.
Part of the struggle in placing and abiding by responsible AI practice is that AI as a label is ambiguous because it does not have a standardized definition. Is AI machine learning? Is a merge and deduplication effort deemed AI? Is AI a simple script? It is hard to determine what is responsible in AI when there are so many ways to define AI.
The EU AI Act defines AI in Article 3.1 as
“a machine-based system that is designed to operate with varying levels of autonomy and that may exhibit adaptiveness after deployment, and that, for explicit or implicit objectives, infers, from the input it receives, how to generate outputs such as predictions, content, recommendations, or decisions that can influence physical or virtual environments (2024).”
“technology that automatically generates content in response to prompts written in natural-language conversational interfaces. Rather than simply curating existing webpages, by drawing on existing content, GenAI actually produces new content. The content can appear in formats that comprise all symbolic representations of human thinking: texts written in natural language, images (including photographs, digital paintings and cartoons), videos, music and software code. GenAI is trained using data collected from webpages, social media conversations and other online media. It generates its content by statistically analysing the distributions of words, pixels or other elements in the data that it has ingested and identifying and repeating common patterns (for example, which words typically follow which other words) (2023, p.8).“
The United States Congressional Research Service defines AI as
“computerized systems that work and react in ways commonly thought to require intelligence…[and] refers to machine learning (ML) models developed through training on large volumes of data in order to generate content (2023).”
There are many definitions of AI outside of these, and many that were written before the Large Language Models (LLMs) we see used the most today, but most definitions at least agree that AI is an unsupervised model trained on vast amounts of mostly open web data, used for generating text and images based on statistical predictions from its learnings.
Aside from the definitions issue, AI is the new hype word; so many products now claim to use AI when they may not be using it. Mislabeling what is and is not AI makes it harder to enforce ethical and responsible AI, and it also makes it harder for consumers to identify what is AI (or generated by AI) and take precautions when using it. That, and with new regulations like the EU AI Act, it is even more important to have correct labeling of what is using or generated by AI, and that requires a standard definition to help companies and users identify what they are using, or what is generated by, AI and treat it responsibly.
To make sure EBSCO can identify and label what product features are using or generated by AI, and make sure we communicate that appropriately to our users, we will take inspiration from the authoritative sources mentioned here and define AI at EBSCO as “a general term for machine learning processes where the model learns from a vast amount of unstructured and untagged information, usually from the open web, where it learns linguistic properties, such as the way to construct sentences, how people talk about certain topics, and how to contextualize what those words mean when entered together, all to predict how to respond to users’ needs entered as a query or prompt to the model.”
Throughout this series, we will walk through the AI tenets that EBSCO is using, where we are using it, and highlight how we are supporting responsible research through responsible AI.
Follow along as we dive into each of the six AI tenets in the coming months.
- Quality: Using authoritative data and resources to ground AI in sources of truth and human-in-the-loop vetting by librarians
- Transparency: Clear labeling of AI features and explainable AI to support informed decision-making
- Information Literacy: Partnering with librarians to teach how to use AI, what is responsible AI, how to assess AI outputs for research and what is acceptable AI use
- Equality: Grounding in diverse and vetted data, providing equal access to content regardless of research experience, primary language, or domain expertise
- End-User Value: AI is user first which is tested and vetted by users, and UX that is focused on how AI can be responsibly used for more effective research
- Data Integrity: EBSCO AI will comply with the same data policies and procedures currently in use protecting the privacy and rights of copyright holders and user data