The Ultimate Guide on How to Measure Chatbot Effectiveness

In this post, you are going to learn how to measure the effectiveness of a chatbot through the use of KPIs and other data insights.

While consumers are welcoming the advent of messaging bots due to the popularity of voice bots, most businesses are sensing the heightened awareness of how chatbots can help them transform their business. In the ideation phase, most businesses are doing some initial research and development to explore how they can leverage chatbots and the analytics they provide.

Before investing in the development (and maintenance) of a chatbot, success metrics, benchmarks, and KPIs need to be determined in order to ensure that there is a feedback loop to ensure that your chatbot is performing to the standards of your users, and thus providing a lift to your company’s bottom line or customer satisfaction rates.

While standards for KPIs and success metrics for chatbots haven’t been established as an industry, chatbot success can be measured in a multitude of ways. In addition, chatbots can be deployed on multiple engagement channels, and each of those engagement channels may have their own criteria of of success metrics.

Chatbot performance metrics can help your business move forward and become a welcomed assistant to your business if it’s used effectively. Whether you are using a chatbot internally with employees or externally to nurture clients, there are a subset of metrics that you can use to ensure that you are gauging the accuracy of your chatbot and roadmap the next set of items to focus on for your chatbot.

In this article, I’m going to share with you the metrics that I use or plan on using in my own organization to prove out the value proposition of a chatbot and what it can do for the business.

Why Measure Success of a Chatbot

The speed of implementation, promise of reducing customer service resources through deflection, and the ability to provide a personalized experience to clients are big draws for implementing a chatbot. With any new tool, there should be goals and objectives to ensure that the technology is proving out what was promised during the evaluation phase of implementing it. If you can’t prove the value of implementing a chatbot, then there isn’t any data to support maintaining it and adding additional skills unless your company has unlimited funds.

Chatbot analytics provides valuable insight into opportunities for business growth and ability to retain users.

Chatbot Metrics for Each Additional Skill, Task or Function Added

Once you have an established chatbot, the project team and stakeholders will have a desire to continue to add more skills.

Before adding a new feature that the chatbot can do, defining an objective or goal is important to ensuring that it doesn’t impact the current chatbot performance.

Categories of Success Metrics of a Chatbot and Accountability

There are several categories that benchmarks, KPIs, and data can fall under when evaluating the effectiveness of the implementation of a chatbot for a business.

In my own experience, I learned that not separating out these categories of metrics and their respective report often caused confusion as to what the success metric would provide a result in from the business perspective.

Types of Chatbot Success Metrics

These are the three types of chatbot analytics and insights each KPI should be categorized under:

User Adoption Reporting
Performance Reporting
Business Value Reporting

With a chatbot, analytics and the data mining from the conversation logs are very important in ensuring that a feedback loop is created.

Accountability of Chatbot Metric Categories

Each type of metric has a different RACI matrix when it comes to monitoring and helping prioritize the next stories needed to improve the chatbot. There are different people as part of the chatbot project team that need to ultimately held both accountable and responsible for measuring the overall effectiveness based on their specific roles.

User adoption metrics should be managed by the organization or teams that own the relationship for the audience of the chatbot. This will allow them to keep a pulse on the number of users that are actually using the chatbot. The people responsible for the relationship need to be accountable in ensuring that the marketing and adoption rate continues to increase or at least find out the reason for the chatbot failure to engage and/or retain users. Users can refer to employees for an internal chatbot or clients/potential clients for an externally facing chatbot.

Performance reporting should be managed by the development team like a technical engineering architect. An engineer or technical developer will be accountable for metrics that measure the response time, performance of skills as it scales to larger audiences, and data connections/web services that the chatbot relies on continues to be consistent as usage increases or decreases. This type of reporting can also hold a data scientist on your team accountable to ensuring that the language model being used is accurately capturing the intent of the user.

Business value reporting should be managed by the product owner, product manager, or executive overseeing the bot. This should be in line with the objectives and business goals that the company set out to achieve at the beginning of the project. Generally, these metrics come from both the bot database as well as any other databases for company KPIs in order to compare the increase in revenue based off of a certain skill’s usage or a decrease in the support ticket queue because of the increase in usage of a FAQ skill.

Measuring the Wrong Analytics in a Chatbot

With all the data and insights that a chatbot or conversational agent can provide, there is the potential of the project team measuring too many analytics and focusing on the wrong priorities due to the data.

What I have seen go wrong is that a platform may not have a certain way to capture the data, so project teams will create the ability to capture the metric without objectively looking at what implementing the capture of the metric will do for the business.

Ensure that the metrics you are measuring are worth the investment in capturing it.

Measuring Return on Investment

When businesses talk about measuring ROI with analytics, this can be calculated in a number of ways. From client satisfaction converting into repeat sales, employee engagement to reducing retention rates, and increasing conversion rates from potential clients, ROI can be calculated in varying ways. Being able to understand which KPIs are important in driving the overall ROI metric will be important for your organization to decipher.

Specific Metrics for Each Chatbot Analysis

Below, I’m sharing with you each of the categories and a high level list of metrics that you should consider using in measuring your chatbot’s success as it is rolled out. Below each list, I’ll describe what it

User Adoption Success Metrics

This is a list of chatbot user adoption metrics that you should choose to measure as KPIs in your business. This grouping of KPIs should feed into the conversation for the other categories for performance and business value.

The goal of measuring user adoption is to ensure that the marketing and communication teams are doing what they can to promote the chatbot and/or its skills to ensure that they are getting the value that was communicated when journeying into the development phase. Again, the term users can mean an employee (for an internal chatbot) or a client/potential client (for an external chatbot):

Audience
- Total Number of Users
- New Users
- Returning Users
- Volunteer Users
- Engaged Users
- Users by Channel
Session Details
- Chat Volume and Sessions
- Average Daily Sessions
- Number of Sessions Initiated
Conversation Details
- Total Conversations
- Conversation Starter Response Rate (Proactive /Push Message Rate)
- New Conversations
- Interactions Per User
- Interactions Rate
- Conversation Duration
- Activation Rate

Audience

Total Number of Users

The total number of users captures how many employees or customers are using your chatbot.

New Users

The number of new users to a chatbot will provide real-time insights into the popularity of a chatbot, which is a great parameter to the marketing/communication success.

Returning Users

Returning users are the active users that have had repeated sessions with your bot within a given timeframe (2 years).

Volunteer Users

Volunteer users are the people who interact with your chatbot without any type of direct marketing or communication about the bot. This means that these users recognize the value of the chatbot and are interested in what the bot has to offer. These types of interactions, unprompted and unscripted, are the best ways to gauge the adoption of your chatbot outside of the current targeted segmented audience.

Engaged Users

Engaged users are users that have had repeated sessions within a shorter time period on a daily or weekly basis.

Users by Channel

A chatbot with the same branding can often appear on multiple platforms (to help engage with audiences where they meet your brand). Being able to measure where they are engaging is important so that you can focus on improving the most popular ones of your resources are limited.

Session Details

Sessions are defined as a group of user conversation with your chatbot that take place within a given timeframe. For example, a single user can open multiple sessions. Those sessions can occur on the same day or over several days depending on how your bot sets up the data logging and capturing.

Chat Sessions

This is the total number of sessions that your chatbot has over the course of a period. You can use this to track adoption rates period over period.

Average Daily Sessions

Average daily sessions is a great way to assess the usefulness of your chatbot. It can be a good indicator to the quality and value of the content and service the chatbot is providing.

Number of Sessions Initiated

This is a great way to measure how often users are interacting with your chatbot on a daily basis, especially those audiences that are repeat users.

Conversation Details

Total Conversations

This is the number of conversations that the chatbot is able to handle. This number should go up in parallel with the total number of users.

Conversation Starter Response Rate (Proactive/PUsh Rate)

These are the number of messages that are proactively started by the chatbot where you are trying to illicit a response from the user based on retention marketing.

New Conversations

Evaluating the number of new conversations (both successful and failed attempts) helps to understand the varying degree of topics that users expect to have with a bot. The goal with new conversations and measuring them is to figure out whether users are finding it intuitive to get what they need and for the business to provide value.

Interactions Per User

This chatbot metric is the ability for the chatbot to engage in a fulfilling conversation. The number of interactions indicates how far down the conversational pathway a user was able to go down. Determining the ideal length and number of pathways a user can go down may vary greatly from user to user, especially of the chatbot is present on multiple platforms with multiple types of audience segments.

Interaction Rate

This measures the number of messages that are exchanged during the user and bot conversation. This is a percentage metric to the previous metric.

Conversation Duration

The duration of a chatbot dialogue with a human measures the ability for the user to interact with the bot. This particular metric is different and varies across the type of bot that is being implemented. Short chat conversation durations may be indicative of the chatbot’s ability to quickly answer the users question or the user not being able to find what they need, leaving the conversation. Long chat conversations may be indicative of the chatbot’s inability to do it quickly, requiring multiple pathways to help provide value or the chatbot’s design to asking more questions in order to fulfill the request.

Activation Rate

The activation rate is the metric determining the number of users that engage in more than one task, thus opening the door to more interactions. This metric is the first to indicate that the chatbot has interest by the user to resolve more than what it was intended for.

Performance Success Metrics

The technical engineering leader and data science teams are generally accountable for these metrics. The overall purpose of this is to provide a dashboard on how quickly the responses are happening as well as ensuring that the language is being understood and categorized appropriately.

Information and Knowledge Retrieval Rate
Follow-up Conversation Rate
Missed Intent Rate
Artificial Intelligence and Machine Learning Rate
Chatbot Response Time
API Transaction Call Time (per integration)
Chat Reset

Information and Knowledge Retrieval Rate

This metric evaluates whether or not the chatbot can find and display the information that the user requested. This metric should not be used alone, as there as a high probability of false positives. While information and knowledge retrieval isn’t unique to bots (example is when users contact human agents multiple times), there is great control to making this metric better because the dialogue is provided on what the user is wanting the chatbot to do.

Follow-up Conversation Rate

This chatbot metric allows you to determine how well your fallback dialogue is able to re-route the user to further clarify their intent and provide a successful outcome through the use of multiple branches of dialogue flows.

Missed Intent Rate

The difference between this and the confusion rate is that the confusion rate is that the missed intent means that the chatbot has no idea what the user is trying to do whereas the confusion rate is generally known as a metric for confusion within an intent.

ARTIFICIAL intelligence and Machine Learning RAte

This is the ability to have an autonomous system in which the dialogue that the bot is confused with initially can be re-learned and re-programmed after a certain time period. Being able to compare similar dialogue groupings from a historical period to current period can show how well your chatbot can adapt to learning and adding to its corpus of knowledge and skill sets it has.

Chatbot Response Time

This is the ability for the chatbot to receive an input to getting a response sent to the user. If the response time starts to lag, you can try to troubleshoot to see if there is an issue with some of the backend process connections.

api TRANSACTION CALL TIME

The ability to monitor and alert your technical team on issues with data connections or microservices is important in ensuring that your users have the best knowledge possible.

Chat Reset

The chat reset allows you to see the number of times users get confused and are trying to get out of a dialogue. The “reset” dialogue can also be seen as a “cancel” dialogue. The reset dialogue generally will have the ability to re-route the conversation for the users on the skills that a chatbot does have.

Business Value Success Metrics

User Metrics
- Goal Completion Rate
- Speech Sentiment and Analysis
- Retention Rate
- Satisfaction Rate
- User Perspective
- Human vs Chatbot Interaction
Intent and Confusion Metrics
- Fallback Rate
- Confusion Rate
Revenue Growth
- Self Service Rate
- Ticket Deflection Rate
- Human Takeover Rate

User Metrics

For the business value categories, the user metrics are specific to the goal of achieving a business function, not just identify the type of audience reach.

Goal Completion Rate

The completion rate allows you to determine whether or not the chatbot met the goal of reducing the expenses for higher levels of service (like chatting with a human agent). The purpose of this metric is to see the percentage of users that reach the goal that your chatbot was designed to accomplish, not just the number of interactions.

Speech sentiment analysis and Feedback

Being able to determine the sentiment of the user provides ways to improve the experience for an end-user, so that you can increase adoption rates, and also the goal completion rate.

Chatbot Retention Rate

Chatbots can help with the analytics and data to record the percentage of users who return to interact with the chatbot during a specific period of time. Retention rates will vary from industry to industry, but you can focus it in your specific niche by breaking down the time frames for the retention rate.

Chatbot Satisfaction Rate

As an added skill, a “feedback loop” can be added to evaluate how helpful the bot is to the end user. This will help identify areas for improvement such as the user experience, response types, irrelevant answers, and bad pathways to improve the bot. Questions to ask in a end-of-dialogue chat session include:

Was the chatbot able to understand you?
Was the chatbot able to respond to your specific question?
Was there a resolution to your problem?
Did you have to work with anyone else from the business to get a final resolution to your question or problem?

User Perspective Rate

This metric observes the quality of an interaction between a user and either the chatbot/human assistant.

Human versus Chatbot Interaction

This is a good metric to determine the success of your chatbot to efficiently deflect potential support queues that are staffed by humans. If there is an increase in escalations to the human pathway, this should be a signal that more content or better user experience is needed to better address the users intent.

Intent and Confusion Metrics

Fallback Rate

The fallback rate captures information about scenarios where the chatbot is not able to understand the dialogue from the user, and needs to be able to redirect or provide a closely related solution.

Confusion Rate

Confusion rate works alongside the fallback rate in that it helps measure the customer intent. One of the biggest issues with chatbot building is that they have a very narrow intent coverage. Being able to get insight on the types of tasks your users are looking for a chatbot to do will allow you to roadmap out the next set of skills based on what users are asking for.

Revenue Growth

Self Service Rate

This allows you to see the number of times users are able to self-service with a chatbot and complete a task that previous workflows had humans working on.

Ticket Deflection Rate

This is the reduction on the average number of tickets overtime due to the implementation of a chatbot.

Human Takeover Rate

This metric quantifies the number of times a user has had to go talk to a human after using the chatbot, a sign that the chatbot could not deliver a satisfactory result to the end user.

Summary of Chatbot Measurement

Consumer demand for chatbots is very clear right now, as they are asking for simple and effective customer service dialogue that is frictionless. Being able to deliver this right out of the gate means ensuring that you are evaluating the effectiveness of your chatbot using the right KPIs and measurement benchmarks.

Defining objectives, goals, and business metrics to review and analyze is the foundation to ensuring that your chatbot continues to provide the insights that your business needs. Companies that are on the journey in implementing a chatbot or conversational agent are on the forefront of removing barriers between application and human silos so that they can provide value to end users. However, chatbots need monitoring and training in order for the business to continue to see value from implementing a chatbot.

There is no one standard for measuring a chatbot’s success. The chatbot success metrics should correlate with the audience segment and the bot’s use case. At the end of the day, the chatbot should do what it was designed to do, leaving current/potential users happy that it’s been able to fulfill it’s intended role.