Chemometrics
Chemometrics is a specialized area within the field of chemistry that applies mathematical and statistical methods to analyze chemical data, enabling researchers to extract meaningful insights. Emerging primarily in the late twentieth century alongside the rise of computing technologies, chemometrics has become integral to various branches of chemistry, including analytical, organic, and physical chemistry. The process typically involves several key steps: data collection from various experimental techniques, identification and evaluation of unexpected data, and comprehensive analysis using advanced software.
The foundational data can vary, falling into univariate or multivariate categories, with the latter often requiring sophisticated tools to explore multiple variables simultaneously. During analysis, scientists must address outliers—data points that deviate significantly from the norm—to ensure the integrity of their results. Chemometrics finds applications across diverse industries, from manufacturing to agriculture, highlighting its multidisciplinary nature and importance in contemporary scientific research. Overall, this field aids chemists in organizing and interpreting vast amounts of data, ultimately leading to new theories, models, and practical applications.
On this Page
Chemometrics
Chemometrics is the science of applying mathematical and statistical techniques to chemistry data to extract useful information. A relatively new field, chemometrics evolved during the computer era and, by the twenty-first century, became an important feature of chemistry. The basic steps of chemometrics involve gathering data, finding and examining unexpected data, and then processing and analyzing the corrected data using computer software. The results, or output, of a chemometric study may be used in a variety of fields and industries.
![A gas chromotography lab, where analytical chemistry uses chemometrics. Hey Paul from Sacramento, CA, USA [CC BY 2.0 (https://creativecommons.org/licenses/by/2.0)], via Wikimedia Commons rssalemscience-20180712-9-171817.jpg](https://imageserver.ebscohost.com/img/embimages/ers/sp/embedded/rssalemscience-20180712-9-171817.jpg?ephost1=dGJyMNHX8kSepq84xNvgOLCmsE2epq5Srqa4SK6WxWXS)

Brief History
For thousands of years, people have sought to experiment with new ideas and draw useful information from their findings. With the development of modern forms of science, this practice became increasingly standardized, with different fields of study following different procedures to achieve their goals. For example, scientists in biology use biometrics to study data. Psychologists use psychometrics, and engineers use technometrics. Inevitably, the important science of chemistry would also develop its own methods of analyzing data and variables.
The concept of chemometrics, the science of gaining information from data in chemistry, evolved mainly in the later twentieth century during the rise of the computer. The term chemometrics likely originated with Swedish chemist and theorist Svante Wold in 1972. Other theorists adopted this idea and developed it in the coming years and decades, often parallel with important milestones in scientific and technological development. Among the most influential theorists of chemometrics were Paul Geladi, Ian Cowe, and Harald Martens.
Despite decades of use, chemometrics is arguably more important than ever. Thanks to advances in scientific learning and computer capabilities, a tremendous and ever-growing amount of chemistry data is now available. Many experiments yield thousands or even millions of important facts that require careful organization and analysis. Scientists of the twenty-first century are challenged to analyze data from many sources and combine the information into forms in which it can be used most effectively and efficiently.
Overview
Chemometrics is a broad term that encompasses many techniques by which chemists and other scientists gain insights and information from data relating to chemical systems. It plays a central role in many branches of chemistry, including organic chemistry, theoretical and physical chemistry, and analytical chemistry. The techniques of data management used in chemometrics link the practice to other fields not exclusive to chemistry, such as statistics and engineering. The findings of chemometric studies are important to a wide variety of fields and industries worldwide, including manufacturing, biology, computing, mathematics, and even agriculture. For these reasons, proponents of chemometrics consider it a highly multidisciplinary field.
The first step in chemometrics is to gather the main components of the study, namely scientific measurements. Chemists gather their measurements and similar data from a great range of sources. Some scientists use chromatography or light-based spectroscopy. Others study the physical properties of substances in more direct ways, such as by analyzing their components, temperature, melting point, or ability to flow over surfaces. During the course of these investigations, chemists gather facts and other data, sometimes numbering in the hundreds of thousands or more. In modern times, scientists generally must store and analyze these findings within computer systems.
Data can take many forms, but generally falls into two main categories: univariate and multivariate. Univariate data is generally simpler because it involves only one variable. Multivariate data is usually more complex as it deals with more than one variable and potentially dozens or more. Analyzing data from many variables requires much more effort, including varied technological tools, measuring systems, and fields of study. Chemists may be greatly challenged to determine the relationships between the data and whether and how variables may differ, depend on each other, or behave individually.
Scientists may gather a massive amount of data, but this data by itself is seldom very useful. To find the needed information from the data, scientists employ a range of mathematical and statistical techniques. These techniques may have countless goals, ranging from testing food for toxicity to engineering new lightweight building materials. In any case, chemometrics helps the chemists remove redundant information and information that appears to be inaccurate and to reach conclusions in the form of new theories, equations, or models.
The initial data used in chemometrics is called the input. The input is generally the most important feature of chemometrics, since faulty or irrelevant data will not yield accurate or useful results. Scientists must transport the input into the computer, whose software will perform the chemometric processes. This may sound simple, but scientists often face great difficulties in converting file types and obtaining rights to special forms of proprietary software necessary for specific tasks.
Once the data has been input into the software, scientists use the software as well as their own experience to find and analyze outliers. Outliers are pieces of information that seem significantly different from related information. Some outliers are legitimate information and merely show an unusual or unexpected result of the experiment. These outliers must be retained to accurately demonstrate the true variation of the results. However, some outliers show clearly inaccurate information, such as miscalculations, typos, or other human or computer errors. Scientists generally remove these true outliers to prevent their tainting the integrity of the overall data set.
Once scientists have addressed outlying data, the data is ready for processing and analysis. First, scientists and software apply a preprocessing method appropriate to the task at hand. These methods may include normalization, baseline correction, mean centering, or other techniques designed to sort data in the most useful way and make the needed information in the data more accessible. Then, the software can begin the task of analyzing the data via a wide range of possible methods chosen or designed to meet the goals of the project. The result of the analysis, known as the output, represents the final findings of the chemometric process. Scientists may use software or their own knowledge to develop models, predictions, classifications, and other forms of products based on the information gained.
Bibliography
Brereton, Richard G. Chemometrics for Pattern Recognition. Wiley, 2009.
Bu, Dongsheng. “Chemometric Analysis for Spectroscopy.” CAMO Software, www.camo.com/downloads/resources/application‗notes/Chemometric Analysis for Spectroscopy.pdf. Accessed 8 Jan. 2019.
Chau, Foo-Tim, et al. Chemometrics: From Basics to Wavelet Transform. Wiley-Interscience, 2004.
Davies, A.M.C. “What IS and What Is NOT Chemometrics.” Spectroscopy Europe Asia, www.spectroscopyeurope.com/td-column/what-and-what-not-chemometrics. Accessed 8 Jan. 2019.
Dearing, Tom. “Fundamentals of Chemometrics and Modeling.” UPAC, University of Washington, depts.washington.edu/cpac/Activities/Meetings/documents/DearingFundamentalsofChemometrics.pdf. Accessed 8 Jan. 2019.
Fernandez Pierna, Juan Antonio, et al. “Basics of Chemometrics.” Walloon Agricultural Research Center, www.cra.wallonie.be/img/page/U15/RAFA/Fernandez.pdf. Accessed 8 Jan. 2019.
Mark, Howard, and Jerry Workman, Jr. Chemometrics in Spectroscopy. Academic Press, 2018.
Singh, Sukhwinder, Hanan Shakeel, and Rakesh Sharma. "Overview of Chemometrics in Forensic Toxicology." Egyptian Journal of Forensic Sciences, vol. 13, no. 53, 13 Dec. 2023, doi.org/10.1186/s41935-023-00371-0. Accessed 6 Nov. 2024.
Varmuza, Kurt, and Peter Filzmoser. Introduction to Multivariate Statistical Analysis in Chemometrics. CRC Press, 2009.