Tools of the OBM Trade: Measure to Evaluate Solutions

By Barbara Bucklin, PhD

bSci21 Contributing Writer

This is the third in a three-part series called ‘Tools of the OBM Trade,’ intended to give you practical advice, examples, tools, and resources to hone your OBM skills. The first two articles appeared in Behavior Analysis Quarterly.  Part 1, called ‘Pinpoint and Analyze the Problem,’ addressed and answered, “What do you want to change, and why isn’t the ‘right’ performance happening in the first place?” (Bucklin, 2016). Part 2, called ‘Implement Solutions’ answered the question, “How do you design and implement OBM solutions to improve the pinpointed behaviors and results?” (Bucklin 2017). In this final article, I’ll discuss how to measure and evaluate your solution in the ‘messy’ real work world to ensure your OBM solutions are making a difference.

As behavior analysts, we’re no strangers to measurement. If you work in a clinical setting or in a research laboratory, it’s the backbone of your behavior analytic interventions. Although we behavior analysts understand this, we often forget to apply scientific rigor to our OBM solutions. If you follow the advice in Part 2, I discussed how we should define our metrics prior to beginning an OBM intervention by asking questions such as:

  • What behaviors and results do we expect? These should tie back to our pinpoints and identify metrics we’ll use, for example, increase monthly client retention or improve client satisfaction survey results.
  • How will we measure and evaluate these behaviors and results? For example, will we use existing client satisfaction surveys, behavioral observations, sales data, client retention data, or profitability data?
  • How often will we measure? The more frequent, the better; we should measure at least monthly.
  • Will we start with baseline data? We should answer yes and figure out how we’ll collect baseline data.
  • Will we measure behaviors and results? We should measure both.
  • How will we compute our metrics? Examples include…
    • Quality measured by number of phone calls that include all checklist items
    • Quantity measured by number of times supervisors deliver positive feedback to staff members
    • Timeliness measured by the time elapsed between a client or family phone inquiry and the return phone call
  • What else might we measure? We should gather input from stakeholders and measure everything that’s important and related to our solution.  

If we start by answering these questions, we’ll be in great shape to evaluate our solution and make ongoing changes if we don’t see the behaviors and results we expected.

Those familiar with instructional design and training solutions will have heard about Kirkpatrick’s four levels of evaluation (Kirkpatrick & Kirkpatrick, 2006). While the Kirkpatrick model has its flaws, it’s a simple and logical framework to consider different levels of evaluation, which help answer, ‘was our solution successful?’ Most training and performance improvement specialists you encounter will be familiar with it, which is a great reason to add it to your tool box and to understand its strengths and limitations. You can learn more by visiting I’ll quickly outline the Kirkpatrick levels here, and add more detail around each level, including suggestions from a behavioral perspective.

Level 1 Evaluation: Employee Reaction or Social Validity

Level 1 evaluates satisfaction with, or reaction to, the solution and is typically measured in a self-report survey format. We ask, ‘Were our employees and stakeholders satisfied with the OBM intervention they experienced?’ In traditional training, these are often called ‘smile sheets.’ As OBM-ers we may call them social validity checks. If our intervention is ‘socially valid,’ the process and results were acceptable, relevant, and useful to individuals who participated in and/or were impacted by the OBM solution. While it’s important to gather these types of social validity data for employee satisfaction, generally there’s no correlation between improved results and satisfaction (Sitzmann, Brown, Casper, Ely, & Zimmerman, 2008).

While I’m not suggesting we ignore participants’ satisfaction with our OBM solutions – it can be important information to have – there are ways to capture data more relevant to performance results. Will Thalheimer’s (2016) book Performance-Focused Smile Sheets: A Radical Rethinking of a Dangerous Art Form illustrates how to write better performance-based questions that can help us design more relevant and impactful OBM and learning solutions. These surveys should target participants’ ability to:

  • Apply what they learned to their work because they acquired a new skill
  • Fully describe what was presented to them because they acquired new knowledge
  • Implement what they learned because their work environment supports it
  • Use the training as an ongoing resource that will guide them to continue putting their learning into practice

In the hypothetical example we explored in Parts 1 and 2, we would want to design Level 1 satisfaction and social validity questions for the overall ‘blended’ OBM solution as well as for each of the components: Process, Work Environment, and Performer. As we develop these survey questions, we need to make them meaningful and concise.  Data collected by SurveyMonkey® (2011) show that the longer a survey is, the less time people spend on each question. For surveys longer than 30 questions, respondents spend around half the amount of time on each question as they do on questions when surveys are shorter than 30 questions.

Level 2 Evaluation: Learning

Although it’s arguably the least important evaluation level, I won’t leave out Level 2 knowledge and skill acquisition measures. Problems with this type of evaluation can range from poorly written test questions, obvious answer choices, and post-learning-only metrics without comparison data. If we decide to use a pre- and post-test to measure the training components of our OBM solution, we’ll evaluate knowledge gain and behavior change in meaningful ways. This means we’ll need to measure what matters. If we want learners to discriminate between answer choices on a multiple-choice test, then we should evaluate with that type of test (I’m being sarcastic because I can’t think of a single training solution with that objective, so we’ll have to do better). If we write performance-based objectives with action verbs, we’ll be able to measure those learning outcomes. Examples include: define, list, state, compute, describe, apply, demonstrate, use, summarize, develop, organize, and recommend.  For instance, to test learning outcomes with our staff members, we could grade their responses to simulated client phone calls using a checklist. I’ve also developed online simulations as pre- and post-test assessments, which attempt to replicate the job as much as possible.

Resources are few and far between to guide OBM and instructional design professionals in their attempts to write meaningful Level 2 evaluations. One decent option is Criterion-referenced Test Development (Shrock & Coscarelli, 2007), which outlines different methods for test construction formats, test scoring, and reliability and validation methods.  This option advocates for scenario-based questions, simulations, and real-world skills tests.

Levels 3 and 4 Evaluation: Behavior Change and Accomplishments/Business Results

Level 3 and 4 evaluation data are the important levels that tell us if our OBM solution made a difference. Did our pinpointed behaviors improve? Did those behavior changes correlate with valuable business results? As I mentioned earlier, it’s important to start at the end by defining clear metrics, which can demonstrate value to stakeholders.

Here are some tips that will help you measure OBM success ‘in the real world.’

  • Capture results data the organization already measures, such as customer satisfaction, sales, or client retention. If data are already being collected, they’re most likely important to the organization. It’ll also make your job easier if you don’t have to identify how to collect every metric from scratch.
  • Integrate your measurement plan with managers’ coaching and feedback. When managers use checklists to monitor behavior, capture those data and use them as part of on-going measurement and evaluation. I recommend an electronic Observation Checklist to capture and store behavioral observation data. 
  • Make sure you establish measurement standards so it’s clear what you’re measuring; this is so you and others can replicate your solution and determine if you’re getting the same results (Binder 2001).
  • If you can, use an experimental design or quasi-experimental design. This is how you’ll attempt to rule out other variables, which is tricky to do in the real world. We know our OBM data will never be as clean as laboratory data, but you might consider a between-group design if you can randomly assign some staff and not others to participate. Or consider a multiple baseline where you start some participants sooner than others. At a minimum, be sure you collect metrics before (baseline) and after your solution implementation.
  • Include a time dimension as part of each metric. It’s almost always important to know the time period in which a behavior or result occurred. Accurate and quick performance is different from accurate and slow performance. For example, accurately answering a client’s question in one minute is different from accurately answering the same question in 30 minutes after looking up the information (Binder, 2001).
  • If you calculate percentages or ratios as one of your metrics, also include the performance data from which they are derived. Retaining 80% of 1,000 clients is different from retaining 80% of 5 clients (Binder, 2001).
  • Remember that change takes time; continue measuring for an acceptable time period for your OBM solution to prove its worth, and continue measuring after that.
  • If you have the resources, build an on-ongoing measurement plan that becomes integrated into the organization. Data can and should be used to provide regular feedback, and you’ll know right away if behavior or results start deteriorating.
  • Make performance data directly available to participants. Although we hope our managers and supervisors share feedback with their staff, we know it doesn’t always happen in the real world even with the best training and contingencies in place. I’ve learned from performance needs analyses that, on average, less than 40% of employees receive performance feedback from their managers. To solve this issue, we might develop a ‘Performance Dashboard’ for staff that allows them to see their personal behavior and results feedback such as observation data, customer satisfaction scores, average time to return phone calls, and other metrics relevant to their job roles. While this type of dashboard doesn’t replace direct management feedback, it can help employees improve on their own.

Next Steps

Now that you’ve read all three parts of this ‘Tools of the OBM Trade’ series, it’s time for you to start applying what you’ve learned. Please keep in touch and let me know how I can help you pinpoint and analyze the problem, design and implement your OBM solution, and evaluate it in the real work world. Good luck!

Do you have particular performance goals you are trying to meet in your agency?  Tell us about them in the comments below, and be sure to subscribe to bSci21 via email to receive the latest articles directly to your inbox!


Binder, C. (2001). Measurement: A few important ideas. Performance Improvement, 40 (3), 20-28.

Brent, C. (2011). How much time are respondents willing to spend on your survey? Retrieved from

Bucklin, B.R. (2016). Tools of the OBM trade part 1: Pinpoint and analyze the problem. Behavior Analysis Quarterly, 2 (4), 15-19.

Bucklin, B.R. (2017). Tools of the OBM trade part 2: Implement solutions. Behavior Analysis Quarterly, 3 (1), 8-16.

Kirkpatrick, D. L., & Kirkpatrick, J. D. (2006). Evaluating training programs (3rd ed.). San Francisco, CA: Berrett-Koehler Publishers.

Shrock, S. A. & Coscarelli, W. C. (2007). Criterion-referenced test development: Technical and legal guidelines for corporate training and certification. (3rd ed.). San Francisco, CA:  John Wiley and Sons.

Sitzmann, T., Brown, K. G., Casper, W. J., Ely, K., & Zimmerman, R. D. (2008). A review and meta-analysis of the nomological network of trainee reactions. Journal of Applied Psychology, 93, 280-295.

Thalheimer, W. (2016). Performance-focused smile sheets: A radical rethinking of a dangerous art form. Somerville, MA: Work-Learning Press.

Barbara Bucklin, PhD is a global learning and performance improvement leader with 20 years of experience who collaborates with her clients to identify performance gaps and recommend solutions that are directly aligned with their core business strategies. She oversees design and development processes for learning (live and virtual), performance-support tools, performance metrics, and a host of innovative blended solutions.

Dr. Bucklin serves as President Elect and is on the Board of Directors for the Organizational Behavior Management Network. She has taught university courses in human performance technology, the psychology of learning, organizational behavior management, and statistical methods. Her research articles have appeared in Performance Improvement Quarterly and the Journal of Organizational Behavior Management. She presents her research and consulting results at international conventions such as the Association for Talent Development (ATD), International Society for Performance Improvement (ISPI), Training Magazine’s Conference and Expo, and the Organizational Behavior Management Network.  You can contact Dr. Bucklin at [email protected]

Be the first to comment on "Tools of the OBM Trade: Measure to Evaluate Solutions"

Leave a comment

Your email address will not be published.