Computer vision

Computer vision is a field of artificial intelligence that enables computers to recognize and interpret visual information similarly to humans. This process involves transforming visual data from sources like photographs, videos, infrared, or ultrasonic sensors into actionable insights through three primary stages: image acquisition, image processing, and image analysis. The journey of computer vision began in the mid-20th century, with foundational concepts rooted in earlier advancements such as photography and the theoretical underpinnings of optical flow. Significant milestones include Larry Roberts' pioneering work on extracting 3D data from 2D images and David Marr's influential models that advanced the understanding of visual perception in machines.

Today, computer vision plays a critical role in numerous cutting-edge technologies, such as self-driving cars, drones, and robotics. The technology allows machines to interact with and understand their surroundings, enhancing their operational efficiency. Its applications extend to motion recognition, augmented reality, and user interfaces, incorporating techniques like eye-tracking and gesture control. While the development of effective algorithms poses challenges, the potential for computer vision continues to expand, promising to shape various aspects of modern life significantly.

Published in: 2024

By: Lasky, Jack

Subject Terms

Computer vision

Computer vision is the ability of a computer to recognize and interpret the content of an image. Put more simply, computer vision refers to a computer's ability to "see" the way humans do. Much like human sight, computer vision is a process through which visual sensation is transformed into visual perception. During this process, a computer receives visual data, analyzes it, and makes some decision about that data. While most of the visual data involved in computer vision is received through normal pictures or videos, it can also come from infrared sensors, ultrasonic sensors, or other sources. Computer vision helps a wide variety of computer-controlled machines to work more efficiently and more intelligently. For this reason, computer vision is a key component of many cutting-edge technologies, including self-driving cars, remote control drones, and robots. Moreover, as computer vision technology continues to evolve, the practical role it plays in modern life is likely to grow along with it.

Brief History

While the concept of computer vision did not begin to develop until the mid-twentieth century, its fundamental roots can be traced back much further. The evolution of computer vision started with the invention of photography by Louis-Jacques-Mandé Daguerre and Joseph Nicéphore Niépce in the 1830s. At the time, the lenses and other visual technologies that made photography possible represented a major scientific step forward. Years later, however, they also played a key role in the development of computer vision. Through his 1921 play R.U.R., author Karel Čapek popularized the idea of robots, or mechanical devices that could move and do many of the same tasks as humans. Part of the challenge of making robots a practical reality was figuring out a way for them to perceive and understand the world around them. Essentially, they needed some form of computer vision. In time, camera lenses and the other visual technologies involved in their use came to be viewed as practical analog for human eyes that would allow robots to collect and analyze visual data.

The true development of computer vision began in the 1940s and 1950s when psychologist James J. Gibson introduced the concept of optical flow, which refers to the pattern of apparent motion of objects in a visual scene that is caused by the relative motion between an observer and a scene. Eventually, the idea of optical flow played an important role in making computer vision a plausible reality. The first major breakthrough in computer vision came when Larry Roberts, the so-called "father of computer vision," wrote a thesis at the Massachusetts Institute of Technology (MIT) on the possibility of extracting 3-D geometric data from 2-D views in the early 1960s. Roberts's work generated significant interest in computer vision at MIT and jump-started the effort to make computer vision possible. By the 1970s, MIT was offering a machine vision course at its Artificial Intelligence Lab. In this course, researchers worked to develop and refine early attempts at achieving computer vision. One of these researchers was neuroscientist and physiologist David Marr, whose work eventually yielded a landmark bottom-up approach that explained how computer vision worked. Thanks in large part to Marr's work, a wide variety of machines capable of some form of computer vision started appearing in the 1980s and 1990s. New computer vision systems continue to be developed in the twenty-first century.

Overview

At its core, computer vision is meant to provide machines with a way of emulating human vision using some form of digital imagery. In essence, computer vision allows machines to build a visual understanding of the world and make decisions based on that understanding just as humans do. The process through which this is made possible includes three main steps: image acquisition, image processing, and image analysis and understanding.

Image acquisition refers to the process of translating the real world's analog data into binary data made up of ones and zeros that computers can understand. Many different tools and devices can be used to facilitate this process. Some of these include digital cameras, webcams, 3-D cameras, and laser range finders.

The second stage of the computer vision process is image processing. In this stage, the acquired images are processed into low-level information through the application of special algorithms, or sets of rules to be followed in carrying out some sort of calculation or problem-solving operation. These algorithms, which include edge detection, segmentation, classification, and feature detection and matching, help to compress the huge amount of detail found in a normal image into a format that is easier for computers to interpret.

Once image processing is complete, the final stage of the computer vision process—image analysis and understanding—can begin. During this stage, high-level algorithms like 3-D scene mapping, object recognition, and object tracking are applied in concert with the original image data and the low-level information produced through image processing. All of this allows the computer to fully analyze the information it has collected, interpret that information, and make an appropriate decision based on that information.

Developing a computer vision system is often a difficult prospect. This is in large part because of the various challenges associated with creating the algorithms required for the operation of such systems. In some cases, an ineffective algorithm may return noisy or incomplete data. Others may fail to provide real-time processing. Still others may have limitations in terms of power or memory that impede the computer vision process. Ultimately, a computer vision system is only as strong as its underlying algorithms.

The range of potential applications of computer vision is quite broad. Computer vision can be used to power a wide range of motion recognition technologies. It can also be used in conjunction with smartphones and other similar devices capable of running augmented reality software that superimposes computer-generated images on the user's view of the real world. Such devices can also incorporate computer vision as part of interface techniques such as eye-tracking and gesture control. Domestic and service robots are often equipped with some form of computer vision that allows them to gather information and interact with the world. Some types of computer vision can even be used to aid in image restoration. Most notably, however, computer vision is playing a key part in the development of self-driving cars, specifically providing a way for these vehicles to see what is happening around them and respond accordingly.

Bibliography

Coldewey, Devin. "WTF Is Computer Vision?" TechCrunch, 13 Nov. 2016, techcrunch.com/2016/11/13/wtf-is-computer-vision/. Accessed 15 May 2017.

"How Computer Vision Works (and Why It's So Amazing)." GumGum, www.gumgum.com/image-recognition/what-is-computer-vision/. Accessed 15 May 2017.

Kaiser, Adrien. "What Is Computer Vision?" Hayo, 12 Jan. 2017, hayo.io/computer-vision/. Accessed 15 May 2017.

"Quick History of Machine Vision." Epic Systems, www.epicsysinc.com/blog/machine-vision-history. Accessed 15 May 2017.

Michalowski, Jennifer. "When Computer Vision Works More Like a Brain, It Sees More Like People Do." MIT News, 30 June 2023, news.mit.edu/2023/when-computer-vision-works-like-human-brain-0630. Accessed 14 Nov. 2024.

Reynolds, Matt. "New Computer Vision Challenge Wants to Teach Robots to See in 3D." New Scientist, 7 Apr. 2017, www.newscientist.com/article/2127131-new-computer-vision-challenge-wants-to-teach-robots-to-see-in-3d/. Accessed 15 May 2017.

Weiner, Ken. "Computer Vision Is More than Just Image Recognition." Forbes, 12 Aug. 2016, www.forbes.com/sites/forbestechcouncil/2016/08/12/computer-vision-is-more-than-just-image-recognition/#32fdad4e6065. Accessed 15 May 2017.

Weiner, Ken. "8 Cool New Ways Computer Vision Is Changing Everything." VentureBeat, 29 Jan. 2017, venturebeat.com/2017/01/29/8-cool-new-ways-computer-vision-is-changing-everything/. Accessed 15 May 2017.

"What Is Computer Vision, Really?" Medium, 18 Feb. 2016, medium.com/@madstreetden/what-is-computer-vision-really-4fb5a4c7dd8c. Accessed 15 May 2017.

Computer vision

Related Topics

On this Page

Subject Terms

Computer vision

Brief History

Overview

Bibliography