Information retrieval (IR)
Information Retrieval (IR) is the process of searching for and obtaining information relevant to a specific topic from various sources, with a significant focus on electronic databases in the digital age. Historically, IR has evolved from a complex physical search for information stored in disparate locations to a more streamlined digital process facilitated by computers and the internet. The term "information retrieval" was first coined by American computer scientist Calvin N. Mooers in the 1950s, emphasizing the search for information whose location and existence may be unclear.
In modern contexts, IR encompasses a range of media, including text, images, videos, and audio, and involves both the development of effective search queries and robust retrieval systems. Successful IR hinges on two critical elements: the quality of the system used to access information and the effectiveness of the search query formulated by the user. A skilled information retrieval specialist, often found in libraries, plays a key role in optimizing queries to ensure accurate and relevant results.
With the vast amount of information generated today, having advanced IR systems is essential for managing and retrieving data efficiently. Challenges in IR include the potential for either insufficient or overwhelming amounts of information, as well as the risk of retrieving irrelevant data. Overall, IR represents a dynamic field that integrates knowledge from computer science, linguistics, human behavior, and library science to enhance the search for information in an increasingly complex information landscape.
On this Page
Subject Terms
Information retrieval (IR)
Information retrieval (IR) is the process of searching for, locating, identifying, refining, and presenting information relevant to a particular topic. The process can be as simple as looking up a contact on one's cell phone or much more complicated, such as compiling the results of multiple medical studies from international databases. IR can involve locating information from any source, and some projects or topics may require investigating a variety of sources to find all the necessary data. However, in the twenty-first century, the most common form of IR involves searching some form of electronic database.
Background
Information retrieval, often known as IR, refers to the tasks related to finding information in any form, including text, photos, videos, audio recordings, or combinations of these. As a result, the process of IR has existed since there has been information recorded in written form. Retrieving information was often a difficult process, as information could be stored in multiple locations around the world, and finding it usually meant first finding that location and then establishing some sort of physical connection to that location, such as traveling there or contacting someone there to copy and send information.
This began to change in the middle of the twentieth century, when computers came into use. American computer scientist Calvin N. Mooers was the first to use the term information retrieval. Mooers coined the term while writing his master's thesis, and the term was used in published writing for the first time in 1950 in two works written by Mooers. He defined information retrieval as the search for information whose location is unknown and whose very existence is uncertain. As a result, Mooers' definition emphasizes searching for something while having only an idea of what one hopes to find.
For example, an internet search on "fixing a flat tire" can produce results ranging from an advertisement for a nearby garage to a personal story about someone's experience fixing a flat tire to several how-to articles with general instructions. While the how-to articles may have been the object of the search, the searcher likely had no specific idea of what websites would have these articles, how many articles existed, how detailed or simple the articles would be, whether there would be pictures or videos included, etc. The other information may prove useful, even though it was not the original intent of the search. As in Mooers' definition, the searcher did not know what would be found or where exactly it would be found.
The process of IR depends in part on creating a good search. However, a number of other factors are important in the process. These include a good system for IR searching, which must perform its tasks with accuracy, precision, relevancy, and some degree of speed. Otherwise, the result could fall under what is known as Mooers' law, which states that people will resist looking for information unless the cost of not having it is greater than the cost of looking for it.
Overview
The widespread use of computers and the internet ushered in the information age, in which more information could be stored and made available than ever before. This increased the demand for ways to retrieve this information. However, these same tools also made it possible to generate more information than previously. In addition to having hard copy newspapers, magazines, journals, and resource materials, all of these things could exist in electronic forms, too. People can write books that exist only as images on a screen and read on electronic devices, allowing the authors to create links to additional information. Fortunately, the same devices that make it so easy to produce more data also help people locate and retrieve the information they need.
IR systems can be complicated or relatively simple. At their most basic, they include the information, a system to access it, and a person to search for it. There are two key factors in achieving useful results: a good system and a good search query (question).
Computerized IR systems work on an "object and query" basis. The object is the information or item (such as a photo) being searched for and a query that will prompt the system to choose that object as its response. As a result, IR is a broad area encompassing many fields of study. Building an IR system requires knowledge of computer science and how computers work; information architecture and how computers store information; human psychology and behavior toward information; linguistics, and the intricacies of the meanings of words (to identify alternative terms a searcher may use); information science to help understand the technical aspects of how to sort, store, and protect information; library science to understand the formal means used to catalog and store information; and statistics to understand the likelihood of different types of searches. The IR system will need to be able to handle records related to multiple forms of material, such as books, periodicals, photos, and videos. The system will also need to be able to integrate with other systems. For instance, a large college library IR system might link to other colleges, government agencies in multiple countries, and databases containing material related to specialized topics such as art, music, and medicine. The system's ability to effectively retrieve accurate data in a timely manner from large databases is called scalability.
In addition to having a system capable of storing and retrieving large amounts of data, the accuracy of a query to an IR system relies in part on the quality of the query. A skilled IR specialist, such as a librarian, will refine the query so that it has the best chance of retrieving the needed information with the greatest accuracy in the least possible time. For instance, after determining the exact question to be answered, the searcher will identify the most important words in that question, determine any alternative terms that may be used, and ascertain if there are any other limits to be imposed on the search, such as a date range or country of origin for the material to be included. For instance, a researcher who is asked to find information on the best adult cat food in America might generate a search that includes the words cat food, adult cat, and feline (as an alternative term), and then limit the search to articles published in American sources within the previous two years to get the most up-to-date information.
IR systems use various retrieval models and algorithms to identify information for the user and present results according to relevance based on the user's query. For example, the vector space model (VSM) is an algebraic method that groups and ranks search results based on similarities by using vectors. Each word in a search is assigned a weighed numerical value according to its importance in retrieving results. Then, cosine similarity is used to compare vectors in search results with those scores in the user’s query. Results with higher cosine similarities are presented to the user first because they are the most likely to contain the information the user wants. Another system, the probabilistic model, ranks the relevance of search results using the probability ranking principle and Bayes’ Theorem. IR systems use these models and many other complex frameworks to support consumer demands in the digital age.
Creating good queries and having a well-developed system eliminates the key problems in information retrieval. These include getting either too little information or too much information. Poor systems or searches can also result in irrelevant or inaccurate information.
Bibliography
"A Basic Model of Information Retrieval Systems." University of California at Berkeley, people.ischool.berkeley.edu/~buckland/papers/analysis/node2.html. Accessed 10 Jan. 2025.
"Calvin Mooers Coins the Expression, 'Information Retrieval'." History of Information, www.historyofinformation.com/detail.php?id=2458. Accessed 10 Jan. 2025.
Garfield, Eugene. "A Tribute to Calvin N. Mooers, a Pioneer of Information Retrieval." The Scientist, vol. 11, no. 6, 1997, p. 9, garfield.library.upenn.edu/commentaries/tsv11(06)p09y19970317.pdf. Accessed 10 Jan. 2025.
Garg, Muskan, et al. Natural Language Processing and Information Retrieval: Principles and Applications. CRC Press, 2024.
"Information Retrieval Models." Rutgers University, aspoerri.comminfo.rutgers.edu/InfoCrystal/Ch‗2.html. Accessed 10 Jan. 2025.
"Information Retrieval Resources." Stanford University, nlp.stanford.edu/IR-book/information-retrieval.html. Accessed 10 Jan. 2025.
Manning, Christopher D., et al. Introduction to Information Retrieval. Cambridge UP, 2008.
Murel, Jacob, and Meredith Syed. "What Is Information Retrieval?" IBM, 28 Aug. 2024, www.ibm.com/think/topics/information-retrieval. Accessed 10 Jan. 2025.
Tyson, Matthew. "What Is Information Retrieval?" Coveo, 15 May 2024, www.coveo.com/blog/information-retrieval. Accessed 10 Jan. 2025.