Failure analysis

Failure analysis is the systematic study of why something stopped working. The topic being studied can be mechanical, such as a car; structural, such as a bridge; or procedural, such as an airline’s security procedures. This type of analysis is important because it helps prevent similar failures in the future, helps improve the reliability of devices and processes, limits future losses due to similar failures, and helps determine who or what was liable for any damages resulting from the failure.

rsspencyclopedia-20190201-60-174304.jpgrsspencyclopedia-20190201-60-174593.jpg

The failure analysis process is complicated, can involve many steps, and require input from many different sources. These can range from human experts to mechanical and chemical tests to various forms of imagery technology. Technology has greatly advanced failure analysis techniques. However, one of the most important factors in the failure analysis process is the ability of the human analysis team members to remain objective and unbiased as they search for and identify the root cause of a failure. This helps to ensure that the process is thorough and correctly identifies the problem.

Background

Informal failure analysis is an almost instinctual human behavior. It is likely that even the earliest humans reflected on things that they had attempted to do to determine why they did not get the expected result. Even something as simple as taking food along while moving from one location to another was likely the result of becoming too hungry to continue on a previous trip and attempting to avoid a repeat of that failure. Many improvements in technology and ways of doing things over the centuries were the result of people analyzing why something did not work and finding a better way to do it.

However, as scientific technology began taking huge leaps in the early part of the twentieth century, experts began looking for ways to systematically determine the cause of failures. In these cases, “failure” would be defined as a device, system, or process that did not function to complete the task for which it was designed. One of the best-known early examples of this was the investigative review begun in 1908 after a fatal crash of one of the airplanes designed by Orville and Wilbur Wright.

During World War II in the 1940s, the United States Army experienced repeated failures of one type of ammunition. Army officials ordered an investigation to determine the cause. The resulting investigation was formally outlined in Military Procedure MIL-P-1629, Procedures for Performing a Failure Mode, Effects, and Critical Analysis (FMEA or, sometimes, FMECA), finalized on November 9, 1949.

The Army’s procedure became the basis for similar procedures used by the National Aeronautics and Space Administration (NASA) in the 1950s and 1960s. Variations of FMEA were used during numerous NASA missions, including the Apollo and Viking missions in the 1960s and early 1970s, the Skylab missions in the early 1970s, the Voyager missions that began in the late 1970s, and the Magellan and Galileo missions of 1989. NASA’s success with FMEA led others in technological industries to adopt the process, including the Geological Survey and others both in and out of the government.

By the middle of the 1970s, FMEA or similar methods of failure analysis were becoming widespread throughout many industries. The automotive industry adopted its own version and submitted it to the agencies responsible for industry regulation for review and adoption. The first formal failure analysis method specific to the automotive industry was published in 1993 by the Automotive Industry Action Group (AIAG).

The failures that these processes examine can have many causes and multiple levels of complexity. For example, on January 28, 1986, the space shuttle Challenger suffered a catastrophic accident, killing all onboard. An external fuel tank exploded, and the massive amounts of gas inside burned rapidly, tearing the craft apart seventy-three seconds after take-off. The tank exploded because hot gas and flames were leaking, burning a hole in the tank that caused it to fail. The gases leaked and burned because a seal known as an O-ring failed. The O-ring failed because it was being used in a situation for which it was not completely suited and because it was cold that day, and the low temperatures affected the integrity of the ring. A simple analysis would say the accident happened because a fuel tank exploded. A solid failure analysis process determined that an O-ring and an insufficient process for determining safe launch conditions led to the disaster. It also determined steps and procedures to prevent something similar from happening in the future.

Overview

Failure analysis is a vitally important process in many industries. It plays a key role in making items and processes that are reliable, safe, and financially feasible. Finding the root cause of failures protects lives, minimizes the economic risks faced by companies and institutions, and helps build trust in these organizations.

The method or methods used in failure analysis depend on the type of failure that occurred. There are many reasons why something might fail. There may be a faulty process or insufficient maintenance or inspection practices. Human error can often be a factor, resulting in flaws in design, material selection, oversight of procedures, manufacturing processes, failures caused by incorrect assumptions about a process or material, or misuse of a product or system. Materials used in creating a product or structure can also fail due to inherent imperfections or unexpected circumstances in their use.

Businesses and institutions can use many techniques in the failure analysis process, depending on the nature of the failure. For example, a crew tasked with determining the cause of a plane crash will examine the plane and its components; review voice, data, and video recordings; speak to any survivors who were involved; look at policies and procedures for the airline and airports involved; and examine the background of the flight crew and others. This could involve laboratory analysis of plane components, computer analysis of flight data, interviews, background reviews, and other investigative procedures.

Regardless of the techniques used, all failure analysis follows a basic series of steps. An appropriate team of investigators is chosen and given goals. They will make sure they understand the failure incident and its possible root causes, taking care to ensure this step is thorough and objective. Once all possible anticipated causes are identified, the investigators will systematically examine the probability of each one and determine which is most likely. Further analysis will be done to confirm the root cause.

Once this is done, the investigators will look at all the possibilities that led to that root cause and identify possible ways to correct it in the future. Each of these possibilities will then be evaluated, so the best possible action can be chosen. The team will finally conduct any testing possible to confirm this as a recommendation for the future.

While conducting a failure analysis, investigators will collect all available information about the failure and its various aspects. This allows them to carefully examine everything possible about the failure to accurately identify the root cause. They may also determine that there were additional causes. For example, in the Challenger disaster, the primary cause was that the O-ring failed to act as a seal and keep the gases in the tank. However, it failed because of secondary causes, including a design problem with the tank and the lack of a launch procedure that could identify that the cold weather could affect the safety of the mission.

The FMEA method originated by the US Army and NASA is among the most widely used methodologies in failure analysis. The investigators attempt to identify every possible means by which every aspect of the item, system, or process under investigation may have failed. The next step is to determine how each of those possible failure points would affect all aspects of the item or system. The investigators will classify the failure by how it affects the use of the system or process and its ultimate purpose, then determine how likely it is that the problem will happen if nothing changes. They will look to determine whether the problem is likely to occur in a specific situation and, finally, identify ways the risk of failure can be minimized or eliminated.

One technique frequently used in failure analysis is reconstructing the incident. The investigators will attempt to replicate the conditions that occurred during a vehicle accident, building collapse, fire, or other incident. The reconstruction often involves specific aspects that are identified as the most likely cause of a failure. For example, if a construction crane collapses, investigators might determine the portion of the crane that most likely failed and reconstruct the known conditions at the time of the accident against a similar part to test the strength of the materials used to make that part. They might also use models or computer simulations to test aspects of a failure. In some cases, they may even use full-scale replicas to test their hypotheses about the failure’s cause. Reconstructions can help provide insights into causes and contributing factors that may not otherwise be apparent.

In the twenty-first century, technology has played a critical role in advancing failure analysis techniques. Artificial Intelligence and Machine Learning can interpret data, recognize patterns, and make predictions. The integration of technologies has led to more efficient and comprehensive failure analysis. 

Bibliography

“Emerging Technologies Reshaping the Failure Analysis Market: Impact Assessment and Growth Projections (2024-2031).” LinkedIn, 21 Nov. 2024, www.linkedin.com/pulse/emerging-technologies-reshaping-failure-analysis-market-xhele. Accessed 19 Dec. 2024.

“Failure Analysis.” Science Direct, www.sciencedirect.com/topics/engineering/failure-analysis. Accessed 19 Dec. 2024.

“Failure Analysis Services.” Element, www.element.com/materials-testing-services/failure-analysis?origin=nts&utm‗campaign=NTS-Web-Migration&utm‗medium=referral&utm‗source=NTS. Accessed 19 Dec. 2024.

“Learn about Quality: FMEA.” American Society for Quality, asq.org/quality-resources/fmea. Accessed 19 Dec. 2024.

“Root Cause Analysis: Challenger Explosion.” Think Reliability, www.thinkreliability.com/case‗studies/root-cause-analysis-challenger-explosion. Accessed 19 Dec. 2024.

“Three Reasons to Perform Failure Analysis.” Element, 26 July 2023, www.element.com/nucleus/2016/07/07/three-reasons-to-perform-failure-analysis. Accessed 19 Dec. 2024.

“What is Failure Analysis? Definition and Examples.” Market Business News, marketbusinessnews.com/financial-glossary/failure-analysis. Accessed 19 Dec. 2024.

“What is FMEA?” International Datalyzer, www.datalyzer.com/knowledge/fmea. Accessed 19 Dec. 2024.