Call Scoring Using the Scientific Method

Using The Scientific Method to Create an Effective Call Score Rubric

Call scoring is tricky. Statistically valid call scoring is trickier. And because call scoring is a science, best practices should remain the same across every call center, even if the conversation topics vary from organization to organization. Unfortunately, call scoring tends to fluctuate from contact center to contact center, which is something we hope the scientific method can partially fix.

For context, the scientific method is a set of best practices that all scientists agree upon, regardless of their specialization. It outlines how scientists should establish and test a hypothesis – systematically observing, measuring, testing, and experimenting so that others can replicate their results. Call scoring, being scientific in of itself, should follow the same formula.

With that in mind, we apply the scientific method to call scoring:

 1) Ask a Question:

 The first step to effective call scoring is defining your question, or set of questions, that inform how the call score should weight different variables. These questions can be simple or more complex; here are some examples by use-case:

Example QuestionIndustry
“Does my chance of closing a sales call increase as the conversation gets longer?”Sales
“Does a customer service agents enthusiasm play a role in agent call performance?”Customer Service
“How much more likely is a consumer complaint to occur when a collector fails to read the Mini-Miranda at the beginning of the call?” Debt Collections
“How does my agent’s talk speed affect call results?” Compliance & Quality Assurance


 2) Construct a Hypothesis Underlining the Call Score:

 Your call score hypothesis should follow the following logic: “If [we do this] then [this result] will occur.”

 Here are some examples:

 “For every minute that elapses where a rep doesn’t ask to schedule an installation, the chances that she schedules an installation decrease by 12%.”

 “Every time an agent talks for more than 70% of the call, the chances he makes a sale diminish by 35%.”

 “When a call center agent fails to read a legal disclaimer at the beginning at the call, the chances we receive a customer complaint increase by 23%.”

 Again, you don’t need to know these questions to be true – you simply need to make an informed hypothesis based on your actual observations in your call center.

 Once you have a call scoring hypothesis outlined, the next step is to test it.

 3) Test your Call Scoring Hypothesis Through Experimentation:

 Drumming up hypothesis is easy; running a statistically valid call scoring experiment is more challenging. For an accurate call score, you’ll need to remove confounding variables. A confounding variable is a variable that affects both the independent variable (the thing that changes the call’s result; for example, a rep’s tone on a sales call) and the dependent variable (the actual call result; for instance, whether or not a rep makes a sale).

If you don’t remove all confounding variables, your call score will be an inaccurate measurement of conversation quality.

 Here is an example. Let’s say that a sales or customer service manager theorizes the following:

 “Every time a rep mentions a watermelon, the chances that the call ends successfully drops by 70%.”

 Since this call center sells auto warranties, this seems like an incredibly reasonable hypothesis and the call center manager decides to test it.

 He listens to 30 calls that end unsuccessfully, noting that “watermelon” was mentioned on 16 of the calls. Inevitably, he concludes, watermelon-based language must play in a role in the result.

 But does it? Let’s unpack it:

 1) Did the manager listen to 30 calls from each rep, or 30 calls in total? If he only listened to 30 calls from a single rep, he hasn’t proven watermelon leads to a negative result across the entire call center; instead, he’s merely illustrated that watermelon affects a specific rep negatively. Perhaps another call center representative could mention the fruit with only positive results to show for it.

Though listening to a single call center rep’s calls could indicate a team-wide pattern, further testing across a broader set of contact center agents is necessary to fully validate the hypothesis.

 2) Did the manager clearly define watermelon language? Perhaps the director wrote the call center script and, naturally, likes how it sounds. As a result, he is more likely to flag conversations as watermelon-related when a rep deviates from the script, even if another manager (or customer) wouldn’t consider the language watermelon-related at all. As a result, the director has accidentally allowed personal bias to affect the call score, invalidating the results.  

 3) Next, it appears the director only listened to 30 calls resulting in unsuccessful call result. In reality, he must also examine 30 calls that ended successfully. In doing so, it’s possible he would discover that 16 successful calls also contained the word watermelon. If that’s the case, it suggests that such language actually has no bearing on call results and shouldn’t influence the call score.

 4) In the 16 calls where watermelon language occurred, it’s possible that another variable was simultaneously present. For example, perhaps the rep raises his voice when describing a watermelon. In this case, it’s possible that the “voice raise,” not the actual word itself, actually influenced call results. For this reason, randomizing the data set is crucial – it removes potentially hidden variables that could make call scoring inaccurate.  

 5) Finally, what exactly is a successful call? In sales calls, it tends to be more obvious: a rep makes a sale or books a meeting. But what about customer service calls? Or support calls? Perhaps the manager defines a successful call as one in which the rep gives a great answer to a customer question. But what constitutes a “great” answer? If the customer calls back a day later, was it truly a great answer? That’s why it’s important to tie a call score to a binary result – a result that either does or does not occur and isn’t open for interpretation.

 For these reasons, call analytics can be very helpful, because they remove (or mostly remove) human variability and interpretation.

4) Communicate the Call Score to the Team:

Let’s say that a successful call result is clearly defined, the data set is representative of the entire organization, and you’ve prevented all confounding variables from the experiment. We can now safely conclude that the call score is valid.

 Now what? Designing a call score is hard enough; getting an entire call center team to change their speaking habits can be even trickier. After all, human beings are human beings; we don’t always like to change how we do things.

 That’s where immediate reinforcement comes in. Immediate reinforcement  – giving feedback immediately after a trigger event – is the single most effective way to adjust conversation habits.  

 That’s where various call analytics solutions can be helpful; tools like Balto automatically detect trigger events (like watermelon) and instantly alert reps to use specific language help change rep’s habits on the phones. They can also manage call scoring and display suggestions to reps live in call.

Of course, call analytics may or may not be in the cards for your call center right now. In that case, there are a few manual options that can help promote call score adherence:

  • Spot check live calls, giving coaching feedback immediately after the call when the context is still fresh in the call center rep’s mind.
  • If real-time feedback isn’t possible, identifying a single, predictable mistake for each rep, then coaching on that particular skill, can quickly increase call quality.
  • Focus call coaching at the beginning of the call – specifically the portions that occur most often – as they will have the most out sized impact on call results.

I hope this helps! As always, please don’t hesitate to reach out if you have any questions.