Evaluation Research
Evaluation research is a systematic approach that employs scientific methodologies to assess whether a specific program, policy, or initiative has successfully met its declared objectives. It involves several critical steps, including goal development, planning, implementation, evaluation, and feedback. Organizations often implement new programs to adapt to market changes or regulatory requirements, making it essential to evaluate the outcomes to ensure effectiveness. This evaluation process not only measures success but also identifies any negative impacts or unforeseen consequences that may arise from the initiative.
To conduct effective evaluation research, it is crucial to establish clear criteria for success, which can range from immediate employee reactions to long-term societal impacts. The process may utilize both quantitative and qualitative data to gauge the effectiveness and make necessary adjustments. Ultimately, evaluation research serves as a vital feedback system that helps organizations refine their programs and maintain competitiveness in an ever-evolving marketplace. By valuing objective measurements over subjective perceptions, organizations can make informed decisions that enhance their operational efficacy and achieve desired outcomes.
On this Page
- Overview
- Further Insights
- The Evaluation Research Process
- Figure 1: Program Development Life Cycle
- Establishing Goals
- Collecting Support Data
- Planning
- Implementation & Evaluation
- Issue
- Defining Criteria of Success
- Figure 2: Levels of Evaluation Criteria (adapted from Kirkpatrick, 1998)
- Conclusion
- Terms & Concepts
- Bibliography
- Suggested Reading
Evaluation Research
Organizations frequently need to implement new programs, policies, or other initiatives in order to gain or maintain a competitive edge in today's rapidly changing marketplace. Evaluation research is the application of scientific methodology and research techniques to determine whether or not a program has met its stated goals. There are several steps in the program development life cycle: goal development, planning (including planning how the program will be evaluated), implementation, evaluation, and feedback based on the evaluation. In choosing how best to evaluate the effectiveness of a program, policy, or other initiative, the organization must develop appropriate criteria of success ranging from the reaction of affected personnel to the impact of the initiative on society or the environment.
Keywords: Business Model; Criterion; Evaluation Research; Policy; Psychometrics; Return on Investment (ROI); Scientific Method; Statistics; Survey
Overview
Many experts liken the organization to an organism that must constantly adapt and change to its environment if it is to survive. Sometimes, the organization must change its policies and procedures in order to meet the requirements of new government regulations. Other times, the organization must change its business model in order to adopt the new technology expected by its customer base. Still other times, the organization must change the way that it deals with its employees in order to increase their motivation, job satisfaction, or job knowledge in order to become a high performing organization. In short, if the organization is not constantly changing to meet the needs of its environment, it will fail.
For most large changes, the organization may need to implement new programs to launch and/or support the change. For example, the installation of a new inventory database may require a training program to teach employees how best to leverage its features into better customer service. The need to motivate the marketing staff to become more productive may involve the implementation of a new incentive or commission program to tie pay to performance. New government regulations for recycling may require programs to ensure that the new requirements are met by employees so that the organization does not run the risk of fines. However, as good as these programs may sound on paper, it cannot be assumed that they will be successful when implemented. Rather, it is important to perform evaluation research in order to determine whether or not the program has met its goals and how, if it has not, it can be made better.
Evaluation research is the application of scientific methodology and research techniques to determine whether or not a program has met its stated goals. Evaluation research is also often used to determine whether or not a program or policy has had negative impact or unexpected consequences that need to be addressed. In general, evaluation research can be thought of as the application of the scientific method to determining whether or not a program, policy, or other initiative has had the intended effect.
Further Insights
The Evaluation Research Process
As shown in Figure 1, the development of a new program or policy has several steps. First, one must specify what goals the new program, policy, or other initiative is to meet. Although this may sound simple (e.g., teach employees a new billing procedure), in actuality, this is often a complex task. The goal must be operationally defined so that when the program has been implemented, its benefits (or lack thereof) can be observed and measured. The choice of a criterion of success (a dependent or predicted measure that is used to judge the effectiveness of persons, organizations, treatments, or predictors) must be carefully made so that it has meaning in the real world and actually reflects the impact of the organizational program on its goals. For example, training programs are often evaluated using such measures of "success" as whether or not the trainees liked the training course and thought they had learned from it (neither of which opinion is necessarily grounded in reality) or whether or not they passed an end-of-training test. However, as any school child returning to school after summer vacation can attest, having passed a test at the end of a class does not necessarily mean that the information will be retained.
Figure 1: Program Development Life Cycle
Establishing Goals
An essential part of the process of evaluating the effectiveness of an organizational program, policy, or other initiative is to determine what goal the organization is trying to achieve. To say, for example, that one is instituting a new procedure for arranging inventory in the stockroom in order to increase the efficiency with which it can be retrieved is all well and good. However, if it costs more per piece to arrange the stock using the new procedure than is saved by the efficiency of retrieval, then the return on investment is not sufficient to justify implementation. It would be short-sighted to say that the program was successful because it was successfully implemented, when it actually costs the organization more than it saves. For this reason, it is important to carefully determine what the real goals of the program are both to articulate the vision so that the program can be designed to meet this goal and also to develop a priori criteria that can be used to objectively determine the success or failure of the program vis a vis this goal during the evaluation stage.
Collecting Support Data
Sometimes it is impossible to collect quantitative data to evaluate a program. For example, if a grocery store wanted to determine the change in customer satisfaction based on a change in floor plan, meaningful hard data may not be available, particularly in the short term. What is needed in such a situation is a way to determine customer satisfaction (or, ideally, the change in customer satisfaction) that results from the new store layout. In many cases, customers have their preferred grocery store and will continue to shop there despite all but the most egregious changes. Even if customers do leave to shop elsewhere, this change may be gradual, leaving the store without the feedback it requires in time to make adjustments. In cases such as this, it is necessary to use subjective data in order to acquire the feedback needed to evaluate the whether or not the program (in this case, the new floor plan) is working as expected. Although data such as customers' opinions of a new floor plan are subjective in nature (as opposed to the objective data of whether or not the customers continue to shop at the store after the implementation of the new floor plan), if handled appropriately, they can be used to give the organization meaningful feedback on the effectiveness of its programs. Psychometrics is the science and process of mental measurement. Psychometrics is used to adequately and accurately capture an individual's intangible attitude or opinion.
In order to develop a good data collect instrument, the concepts of interest first need to be operationally defined in terms that can be observed and measured. In the example of the grocery store layout, one needs to determine what "liking" or "disliking" the new floor plan means in practical terms that are relevant to the business of the grocery store. To develop a set of scales to measure this concept, the designers of the survey might first interview a representative sample of customers and/or draw on their own experience in trying to find things in grocery stores. Based on this information, they could then draw up a draft list of questions that could help elicit meaningful feedback about the shoppers' experience. Questions might include:
- How long does it usually take you to do your shopping?
- How long does it take you with the new floor plan?
- Did you find everything you needed today?
- What items couldn't you find?
- How many times did you have to ask for directions to find an item?
- Was the signage helpful?
- How likely are you to continue to shop at this store given the new floor plan?
The questions would then be refined and placed on a quantifiable scale so that they could be statistically analyzed and used to provide meaningful feedback to the grocery store concerning the new floor plan. Using the principles of good psychometric question design, the researchers would then develop a survey instrument which could be given to a representative sample of customers to determine the effectiveness of the new floor plan in meeting the goals of the organization.
Planning
Once the goal of the program has been articulated and operationally defined, the next step is to plan the intervention. This may be the design and development of a training course, a new policy or set of policies, reengineered business processes, or other business initiative. The planning needs to be done with the goal in mind and also with a view to how it will be evaluated. If a program, policy, or other initiative or intervention is implemented in such a way that its effectiveness cannot be evaluated, it will be impossible to determine whether or not it met the goals of the organization or if it yielded an acceptable return on investment. Part of the planning process should be the development of an evaluation plan for determining how the effectiveness of the program will be measured.
Implementation & Evaluation
Once the program has been designed and developed, it is next implemented (e.g., the training course is run, the new policy goes into effect). After an adequate amount of time has passed for the program to take effect (e.g., employees to become proficient in a new method for doing their work, implement a policy, etc.), the evaluation plan developed in the planning stage is implemented. This plan should be based on the application of the scientific method. If quantitative data are collected as part of the plan, they should be statistically analyzed so that the significance of the results can be determined. This information can then be used as feedback for the organization to determine the effectiveness of the program and whether or not there are any parts of the implementation that need to be changed. If it is determined that the program needs to be modified to better meet the needs of the organization, then new goals are set, and the process continues until the program or other initiative either meets the goals or is replaced.
Issue
Defining Criteria of Success
One of the most difficult aspects of evaluation research is determining how best to operationally define the criteria of success of the program, policy, or other initiative under evaluation. To be meaningful, criteria need to be observable and measurable. To conclude that one's employees think highly of a new policy, for example, just because they do not appear to be complaining is to court disaster. Rather, measurement instruments need to be developed or observable hard criteria of success need to be defined so that the scientific method can be applied to the evaluation and the organization can draw reasonable conclusions of the efficacy of their programs based on the results.
The question, of course, is how best to do this. If one waits long enough, one can absolutely judge the value of a program in retrospect. However, at that point, it is too late for the feedback which would allow the organization to make mid-course corrections to help it increase the effectiveness of the program. On the other hand, if one tries to evaluate the effectiveness of a program immediately after it has been developed but before it has become established, the data collected will be meaningless and vague.
Kirkpatrick (1998) outlines six different types of evaluation criteria used to judge the effectiveness of training programs (see Figure 2). Most of these are also applicable to the evaluation of other organizational programs, particularly those in which the employees need to learn and implement new policies and procedures. In the long run, one would ideally like to use long-term, ultimate criteria of success for a program (Levels 5 and 6). These are measures of effectiveness that are collected after all the data are in. The ultimate criterion can only be measured after the program is over (e.g., when it is ended or replaced with another program). More realistically, one might want to look at the return on investment of a new training program or a new set of business processes or the effectiveness of practices to make the organization and its processes "greener." Green programs are typically implemented in part to help reduce the carbon footprint of the organization or to otherwise be more environmentally friendly. However, the intention is typically not completely altruistic. Although the primary intent may be to help the environment, there is typically another business reason as well. Many organizations believe that if they are seen as being environmentally friendly, they will attract more customers to whom environmentalism is important. However, attracting more customers is a result that only occurs over time and is typically not immediately available for use as a criterion of effectiveness. Even in those cases in which customers are immediately attracted to the organization because of a program or policy, it is more important to know how sustainable this increased customer base is over the long term. It can take even longer to see the impact of the organization's policies or procedures on society or the environment.
Figure 2: Levels of Evaluation Criteria (adapted from Kirkpatrick, 1998)
Organizations, however, typically need more immediate measures of success in order to determine whether or not their programs and policies have had the desired effect. Therefore, although ultimate criteria of success are of interest, intermediate criteria (Levels 3 and 4) are often used instead. Intermediate criteria estimate the value of the program, policy, or initiative earlier in the process than do ultimate criteria. Although they cannot be used to state the ultimate impact of the program with the same degree of certainty as an ultimate criterion, intermediate criteria can be used to extrapolate the ultimate degree of success of the program based on its performance after a reasonable amount of time.
For the example of evaluating the effectiveness of a training program, an organization might collect data to determine whether or not the trainees demonstrated the desired behaviors once they were back in the workplace. Another intermediate criterion of training success would be to determine whether or not the change workplace behavior actually brought about the desired or expected changes in the workplace. For example, at the behavior level, a program might be a success because it results in changed behavior (a new sales pitch) on the job. However, at the results level of success, it might be a failure because the changed behavior did not bring about the expected results (more sales). When designing an evaluation research study using intermediate criteria it is important, therefore, to determine the real measure of success (i.e., changed behavior or better results).
Sometimes, organizations need to know more immediately how successful a program is. If the employees are unable to articulate what they learned in a training course or what new policies or processes they need to implement, then it is unlikely that they will be able to change their behavior on the job or that the organization will see the desired results. Therefore, immediate criteria (Levels 1 and 2) of success are often used. Level 1 criteria are reaction criteria that articulate what the employee thinks of the training program, new policy, or other initiative right after it has taken place. These are often qualitative data that cannot be expressed in numerical form in any meaningful way for statistical analysis. However, such subjective reactions do not really tell the organization much about the effectiveness of the program. Most meaningful criteria of success need to be quantitative (i.e., measured and expressed in numerical form such as physical dimensions, rating scales) so that they can be analyzed and meaningfully interpreted.
Conclusion
In order to stay competitive in today's competitive market, organizations need to constantly try new things in order to maintain or grow its share of the market. The range of initiatives is endless, including new training programs, motivational programs, customer service policies, and business processes. No matter the new program, policy, or initiative, however, they all have one thing in common: they need to meet their stated goals. If this does not happen, the initiative will not be a success, even if it is accepted by the employees.
However, one cannot simply implement a new initiative and consider it to be a success just because all the steps of the implementation plan have been taken. New initiatives need to be evaluated to determine their effectiveness. Properly designed evaluation research can provide this information as well as information to provide feedback about how the initiative can be improved. Although some might think evaluation research is an unnecessary step and that they can determine the effectiveness of a new initiative by the smile on their employees' or customers' faces, it is only through evaluation research that the effectiveness of a program can be determined and objective feedback concerning ways to improve it can be gathered.
Terms & Concepts
Business Model: The paradigm under which an organization operates and does business in order to accomplish its goals. Business models include consideration of what value is offered to the marketplace, building and maintaining customer relationships, an infrastructure that allows the organization to produce its offering, and the income, cash flow, and cost structure of the organization.
Business Process: Any of a number of linked activities that transforms an input into the organization into an output that is delivered to the customer. Business processes include management processes, operational processes (e.g., purchasing, manufacturing, marketing), and supporting processes, (accounting, human resources).
Criterion: A dependent or predicted measure that is used to judge the effectiveness of persons, organizations, treatments, or predictors. The ultimate criterion measures effectiveness after all the data are in. Intermediate criteria estimate this value earlier in the process. Immediate criteria estimate this value based on current values.
Evaluation Research: The application of scientific methodology and research techniques to determine whether or not a program has met its stated goals. Evaluation research is also often used to determine whether or not a program or policy has had negative impact or unexpected consequences that need to be addressed.
High Performing Organizations: Businesses that consistently out-perform their competitors.
Motivation: An internal process that gives direction to, energizes, and sustains an organism's behavior. Motivation can be internal (e.g., I am hungry so I eat lunch) or external (e.g., the advertisement for the ice cream cone is attractive so I buy one).
Pay for Performance: An incentive plan in which employees are rewarded financially for high performance and contributing to the organization's goals. Pay for performance plans are applicable to all levels within the organization.
Policy: In a business setting, a policy is a set of principles and guidelines based on an analysis of the organization's goals, objectives, resources and plans. Policies are set by the organization's governing body (board of directors) and are used to develop strategy and guide decision making in support of meeting the organization's goals and objectives.
Psychometrics: The science and process of mental measurement. The science of psychometrics comprises both the theory of mental measurement as well as the methodology for adequately and accurately capturing and individual's intangible attitude or opinion.
Return on Investment (ROI): A measure of the organization's profitability or how effectively it uses its capital to produce profit. In general terms, return on investment is the income that is produced by a financial investment within a given time period (usually a year). There are a number of formulas that can be used in calculating ROI. One frequently used formula for determining ROI is (profits — costs) / (costs) x 100. The higher the ROI, the more profitable the organization.
Scientific Method: A cornerstone of organizational behavior theory in which a systematic approach is used to understand some aspect of behavior in the workplace by individuals, teams, or organizations. The scientific method is based on controlled and systematic data collection, interpretation, and verification in a search for reproducible results. In organizational behavior theory, the goal is to be able to apply these results to real world applications.
Statistics: A branch of mathematics that deals with the analysis and interpretation of data. Mathematical statistics provides the theoretical underpinnings for various applied statistical disciplines, including business statistics, in which data are analyzed to find answers to quantifiable questions. Applied statistics uses these techniques to solve real world problems.
Survey: (a) A data collection instrument used to acquire information on the opinions, attitudes, or reactions of people; (b) a research study in which members of a selected sample are asked questions concerning their opinions, attitudes, or reactions are gathered using a survey instrument or questionnaire for purposes of scientific analysis; typically the results of this analysis are used to extrapolate the findings from the sample to the underlying population; (c) to conduct a survey on a sample.
Bibliography
Bansal, H. S., & Duverger, P. (2013). Investigating the measures of relative importance in marketing research. International Journal of Market Research, 55, 675-694. Retrieved November 15, 2013, from EBSCO Online Database Business Source Complete. http://search.ebscohost.com/login.aspx?direct=true&db=bth&AN=90333599&site=ehost-live
Center, A. H. & Broom, G. M. (1983). Evaluation research. Public Relations Quarterly, 28 , 2-3. Retrieved 19 May 2010 from EBSCO Online Database Business Source Complete http://search.ebscohost.com/login.aspx?direct=true&db=bth&AN=4479896&site=ehost-live
Gregory, D. & Martin, S. (1994). Crafting evaluation research in the public sector: Reconciling rigour and relevance. British Journal of Management, 5 , 43-52. Retrieved 19 May 2010 from EBSCO Online Database Business Source Complete http://search.ebscohost.com/login.aspx?direct=true&db=bth&AN=4527004&site=ehost-live
Hsee, C. K., Jiao, Z., Liangyan, W., & Zhang, S. (2013). Magnitude, time, and risk differ similarly between joint and single evaluations. Journal of Consumer Research, 40, 172-184. Retrieved November 15, 2013, from EBSCO Online Database Business Source Complete. http://search.ebscohost.com/login.aspx?direct=true&db=bth&AN=87598550&site=ehost-live
Kirkpatrick, D.L. (1998). Evaluating training programs: The four levels (2nd ed.). San Francisco: Berrett-Koehler.
Struening, E. L. & Guttentag, M. (Eds.) (1975). Handbook of evaluation research. Beverly Hills, CA: Sage Publications.
Van Osselaer, S. J., & Janiszewski, C. (2012). A goal-based model of product evaluation and choice. Journal of Consumer Research, 39, 260-292. Retrieved November 15, 2013, from EBSCO Online Database Business Source Complete. http://search.ebscohost.com/login.aspx?direct=true&db=bth&AN=77741213&site=ehost-live
Suggested Reading
Archer, M. A. (2009). Authentic teaming: Undiscussables, leadership and the role of the consultant. Organization Development Journal, 27 , 83-92. Retrieved 19 May 2010 from EBSCO Online Database Business Source Complete http://search.ebscohost.com/login.aspx?direct=true&db=bth&AN=48656208&site=ehost-live
Bennear, L. S. & Coglianese, C. (2005). Measuring Progress. Environment, 47 , 22-39. Retrieved 19 May 2010 from EBSCO Online Database Academic Search Complete http://search.ebscohost.com/login.aspx?direct=true&db=a9h&AN=16140797&site=ehost-live
Burgoyne, J. G. (1973). An action research experiment in the evaluation of a management development course. Journal of Management Studies, 10 , 8-14. Retrieved 19 May 2010 from EBSCO Online Database Business Source Complete http://search.ebscohost.com/login.aspx?direct=true&db=bth&AN=4554755&site=ehost-live
Cohen, R., Kincaid, D., & Childs, K. E. (2007). Measuring school-wide positive behavior support implementation: Development and validation of the benchmarks of quality. Journal of Positive Behavior Interventions, 9 , 203-213. Retrieved 19 May 2010 from EBSCO Online Database Academic Search Complete http://search.ebscohost.com/login.aspx?direct=true&db=a9h&AN=26579762&site=ehost-live
Cunningham, B. M. (2008). Using action research to improve learning and the classroom learning environment. Issues in Accounting Education, 23 , 1-30. Retrieved 19 May 2010 from EBSCO Online Database Business Source Complete http://search.ebscohost.com/login.aspx?direct=true&db=bth&AN=28773926&site=ehost-live
Hawthorne, G., Richardson, J., & Osborne, R. (1999). The Assessment of Life (AQoL) instrument: A psychometric measure of health-related quality of life. Quality of Life Research, 8 , 209-224. Retrieved 19 May 2010 from EBSCO Online Database Academic Search Complete http://search.ebscohost.com/login.aspx?direct=true&db=a9h&AN=11501188&site=ehost-live
Koca, L. C. & Rumrill, P. D. (2008). Assessing consumer satisfaction in rehabilitation and allied health care settings. Work, 31 , 357-363. Retrieved 19 May 2010 from EBSCO Online Database Business Source Complete http://search.ebscohost.com/login.aspx?direct=true&db=bth&AN=35268109&site=ehost-live
Morrow, C. C., Jarrett, M. Q., & Rupinski, M. T. (1997). An investigation of the effect and economic utility of corporate-wide training. Personnel Psychology, 50 , 91-119. Retrieved 19 May 2010 from EBSCO Online Database Business Source Complete http://search.ebscohost.com/login.aspx?direct=true&db=bth&AN=9703303255&site=ehost-live
Patel, L. (2010). Overcoming barriers and valuing evaluation. Training and Development, 64 , 62-63. Retrieved 19 May 2010 from EBSCO Online Database Academic Search Complete http://search.ebscohost.com/login.aspx?direct=true&db=a9h&AN=47875719&site=ehost-live
Pearson, L. C. & Carey, L. M. (1995). The academic motivation profile for undergraduate student use in evaluating college courses. Journal of Educational Research, 88 , 220-227. Retrieved 19 May 2010 from EBSCO Online Academic Searcy Complete http://search.ebscohost.com/login.aspx?direct=true&db=a9h&AN=9506193723&site=ehost-live
Stead, V. (2004). Business-focused evaluation: A case study of a collaborative model. Human Resource Development International, 7 , 39-56. Retrieved 19 May 2010 from EBSCO Online Database Business Source Complete http://search.ebscohost.com/login.aspx?direct=true&db=bth&AN=12511372&site=ehost-live
Stewart, D., Law, M., Russell, D., & Hanna, S. (2004). Evaluating children's rehabilitation services: An application of a programme logic model. Child: Care, Health & Development, 30 , 453-462. Retrieved 19 May 2010 from EBSCO Online Database Academic Search Complete http://search.ebscohost.com/login.aspx?direct=true&db=a9h&AN=14228405&site=ehost-live
Tallon, P. P. & Kraemer, K. L. (2006). The development and application of a process-oriented "thermometer" of IT business value. Communications of AIS, 17, 2-51. Retrieved 19 May 2010 from EBSCO Online Database Business Source Complete http://search.ebscohost.com/login.aspx?direct=true&db=bth&AN=22439524&site=ehost-live