Natural Language Understanding in Library Research Platforms - Findings from EBSCO's Natural Language Search Beta

Natural Language Understanding (NLU) is at the forefront of artificial intelligence (AI) in search. A new beta feature on EBSCO Discovery Service (EDS) and EBSCOhost called Natural Language Search combines the parsing intelligence of NLU with the robust search algorithm that EBSCO has developed over decades of testing and configuration specific to researcher needs.

EBSCO is embracing the power of NLU in library research platforms to improve the effectiveness of research and transform the research experience. While NLU is the technology behind the new Natural Language Search mode on EBSCO Discovery Service and EBSCOhost, advanced search is still honored in search and does not use AI. Combining the power of the EBSCO search engine with NLU, EBSCO is focused on developing an adaptive search that honors all users search preferences and experience levels.

In August-September 2024, EBSCO conducted beta testing on the new Natural Language Search mode on EBSCO Discovery Service and EBSCOhost with 60 undergraduate users and 30 advanced researcher participants. The following are the findings from this user research.

What is the Natural Language Search Feature?

EBSCO's Natural Language Search is an advanced feature that uses NLU to parse user queries more accurately, ensuring that the user's intent is honored. This leads to more precise search results, which is crucial for researchers who need to find specific information quickly. Accessible via the Advanced Search Modes options, Natural Language Search complements traditional search methods by understanding and honoring the users’ qualifiers and parameters in search without users having to “crack the code” on more advanced search functions until needed further into their research journey.

As one respondent noted:

“I was impressed by the covered and relevancy of the nat. lang. search because it pulled out elements that were not as readily accessible as a pure keyword search. At the same time, I could see the benefit from EBSCO's relevancy algorithm because it didn't seem like an isolated semantic search without context. I think less experienced users will benefit from the ease of using natural language, whereas more experienced users will have the opportunity to craft searches that can still take advantage of keywords/metadata and semantic search (as well as introduce new areas outside of their expertise).”

- Beta researcher participant

Natural Language Search Beta Test

During testing, participants from the advanced user group were given a survey where they were asked to conduct at least three natural language search queries with the Natural Language Search mode turned off, and then another with Natural Language Search turned on. Participants were asked about their initial impressions on a scale of 1-5, they were then asked to assess the relevancy of the results of each on a scale of 1-5, and finally to note if they would turn the feature on if it were available.

Similarly, the undergraduate user testing used an A/B test where a natural language search query was conducted in the same EDS interface, but one with Natural Language Search mode on, and one where it was turned off. They were asked about their impressions, to assess the relevancy of each on a scale of 1-5 and lastly, they were asked how the Natural Language Search mode compared to other searches they have used.

What Feedback Did You Gather from Beta Testers?

When asked about their impressions of the Natural Language Search, overall impressions, impressions on relevancy, and their expected use of the search mode, respondents had the following responses:

57% of researchers had a positive impression to Natural Language Search, with 27% finding both Natural Language Search and Traditional search modes useful.
An average of 74% of researchers found Natural Language Search to be highly relevant.
67% of undergraduate participants reported their search experience with Natural Language Search was better than other search engines they have used for their research needs.
87% of undergraduates said it was easier to get quality search results using Natural Language Search.
For advanced researchers, 63% said they would turn on Natural Language Search as their search mode.

Common themes reflected that Natural Language Search has the potential to complement traditional search methods, where items like questions and qualifiers in search may be better addressed in Natural Language search mode.

Where Beta testers queried with questions:

“It seems very user-friendly in that no advanced searches are required to get started on your research. It simplifies a search and makes it accessible for novices and beginners.” - Beta researcher participant

Where Beta testers searched with non-Boolean qualifiers:

“I used real searches from community college students, and I was completely blown away by how the AI tool could retrieve relevant results from poorly structured searches. When I used the same searches with the traditional proximity search, I would often get 0 hits. For example, a search with: How does dehumanizing language affect us got 0 hits, but the AI tool found many relevant results which were displayed on the first page. Another example: factors in safety measures in high schools’ gun violence. 0 results in traditional search but the AI tool did find some relevant articles. In many other searches I conducted, AI found relevant resources while the traditional search pulled up ones that were not appropriate.” - Beta researcher participant

Survey respondents highlighted that Natural Language Search helps speed up the research process by decreasing the stress of performing a perfect search, and many were “amazed” the search mode picked up on the context of their search. This benefit was especially valued by both undergraduate students and advanced researchers, who appreciated the time saved by the search mode retrieving only articles that met their specific need.

“The tools can grasp the context of a search query that leads to more accurate and relevant search results. This helps in refining searches and providing results that are closely aligned with the user's research question.” – Beta researcher participant

Some of the common themes for improvement was for the Natural Language search mode to either detect a Boolean query is being performed and automatically switch to the traditional search mode, or for the Natural Language search mode to better address Boolean queries, especially those with NOT as an operator. Another theme was that the Natural Language search mode retrieved too many results which could be overwhelming to a researcher.

“The contrast between the amount of search results generated when using the Natural Language beta and the traditional proximity search was quite staggering. Often queries that generated as many as 20,000 results when using the NL search would populate as few as 10-500 when using the proximity search tool. Though the large amount of search results does make it a bit more difficult to narrow one's search; this problem is somewhat alleviated by the subject terms.” – Beta researcher participant

There was a wide spectrum of query types from participants. Some treated Natural Language search as a keyword search (beta tester search: “hotdogs”), whereas others entered questions (beta tester search: “What was relationship between the women's suffrage movement and prohibition?”), or specific queries with qualifiers (best tester search: “Epidurals but not including childbirth”). Some advanced researches admitted to trying to “break” the search with randomized or difficult questions (beta test search: “xenomorph art in Egypt or Summer”) to test the tolerance of the search mode with difficult questions (in this example, the beta tester reported that “the result is much more relevant in the second [natural language search], without being perfect… [where] documents have appeared that allow us to create a context from which to focus on the topic.). Further testing is scheduled to investigate the benefits and improvements for each search mode for specific search types.

What Improvements Did Beta Testers Recommend for Natural Language Search?

Based on the beta tester feedback, EBSCO has adjusted the threshold for relevant retrieved documents to reduce the high retrieval count in the result list. EBSCO is also improving how Natural Language search handles Boolean queries, in addition to doing more varied query testing, such as different question and qualifiers, long queries, and negative searches, to stress-test the bounds of the new search mode.

While most beta searches were performed in English, some searching in French indicated they would like the Natural Language search to identify the language of their query and weight content in that same language in their results. Another finding was that each participant had a different understanding of what “natural search” meant, so more assessment on how the search mode is labeled and documented is under investigation.

Transparency in how Natural Language search is implemented was also a common question.

“With a Boolean search, I can teach users how to engineer a search to work, or reverse engineer a bad search to work with new terms or subject headings. In contrast, there isn't a ready-way to tell why AI search results work or don't work. This is the big problem of working with something that is a black box.” – Beta researcher participant

Librarians want clarity on how the parsing is working and what expansions are happening, if any. From this, there are ongoing tests for adding a layer of transparency to display how the search was queried, that AI was used to parse the query, and investigations for combining the benefits of traditional and Natural Language search. While many are still on the fence as to how Natural Language search will be used in its current state, many seemed hopeful and willing to test again once more improvements are made.

Most agreed that Natural Language search would make research at the library easier for novice researchers. One participant said, “I found it very simple and straight forward, I can see this being a very useful tool for students,” while another said, “It will make it easy for students to get relevant results without having to learn advanced search techniques,” while most also agreed traditional search remains their go-to for conducting and teaching advanced research.

What Did EBSCO Learn from this Beta?

Already, the valuable feedback from participants is driving improvements on the search mode. Overall, the Natural Language search mode seems to compliment the traditional search mode. When a user is unsure how to construct a search and a librarian is not there to assist, Natural Language search helps user get in and start searching. Natural Language search was found to be helpful the most on simple questions, or questions with qualifiers. While there are still improvements to be made, Natural Language search is already enhancing results for specific search behaviors that more traditional search does not handle well. And on the other side, traditional search is as strong as ever with retrieving relevant results from Boolean queries. Bringing these benefits together into one search experience that can accommodate all search patterns, as well as building additional search benefits along the way, is what we have learned from this exercise.

Another finding is related to AI in general. There are still a lot of uncertainties in how AI can be used in academia, as well as how the AI is working. As with all AI at EBSCO, we are committed to creating teaching materials, documenting how we are using AI, and making sure we work in partnership with both publishers and customers as the understanding of AI, and its appropriate use, is defined.

Want to Learn More?

EBSCO launched the EBSCO AI Beta Program in June 2024. We are publishing executive summaries like this one for each, and publishing an academic research article, focusing on how our work and findings fit into the larger scholarship of AI in Academia, in early 2025.

Read about AI at EBSCO