In many organizations, call recording is a means to an end. That is, upon audio capture, recorded calls are then transcribed and mined for meaningful keywords and phrases like "mad", "unhappy", or "cancel". Conversational analytics engines automatically identify these words and can alert managers, team leaders and/or quality evaluators who can use those relevant sections of an interaction to better coach underperforming agents.
The accuracy of that analytics process not only comes down to the quality of the analytics software but also the audio that is transcribed. In fact, the transcription stage of a recorded call's life can go one of two ways, depending on the quality of the audio itself:
1. Spoken words are clearly identified and discerned from each party (e.g., customer and agent)
This happens when the recording system distinctly captures and replays both parties on the call on separate recording channels. Rather than mono audio capture, these solutions isolate each voice, enabling transcription engines to very clearly detect each voice in an isolated manner. This separation leads to significantly higher transcription accuracy and minimizes erroneous results with can mislead and waste time.
2. Spoken words are jumbled and misidentified
This occurs often when single channel, mono recording solutions are utilized, which capture both parties on the call on the same recording channel. We all know customer service calls can become contentious, and overtalk does occur. When this happens, the transcription engine has trouble discerning what each individual said.
Speech recognition/transcription software works by breaking down recorded audio into individual sounds. It then analyzes each sound, using custom algorithms to identify the most likely word fit. Once determined, those sounds are transcribed into text.
"Converting speech to text works through a complex machine learning model that involves several steps. Let's take a closer look at how this works:
- When sounds come out of someone's mouth to create words, it also makes a series of vibrations. Speech to text technology works by picking up on these vibrations and translating them into a digital language through an analog to digital converter.
- The analog-to-digital-converter takes sounds from an audio file, measures the waves in great detail, and filters them to distinguish the relevant sounds.
- The sounds are then segmented into hundredths or thousandths of seconds and are then matched to phonemes. A phoneme is a unit of sound that distinguishes one word from another in any given language. For example, there are approximately 40 phonemes in the English language.
- The phonemes are then run through a network via a mathematical model that compares them to well-known sentences, words, and phrases.
- The text is then presented as text, or a computer-based demand based on the audio’s most likely version."
Let's look at an excerpt of a customer service interaction from the two perspectives:
A. Dual channel recording -
Agent - Sorry sir, we cannot issue a refund for this. I wish we could.
Customer - This is crap! I'm going to cancel my account and go somewhere else unless you put me on with a supervisor right now who can make this happen for me.
B. Mono recording -
Agent/Customer - Sorry sir, we cannot issue a refundcancelmy andacccount i wishunlessthisyoucrap for thisunlessyouwishisupervisor right now.
This simple, imaginary example shows how important dual channel recording can be to a business. In the dual channel instance, the transcription software will clearly translate what was spoken into words and the appropriate action can be taken to rescue the disgruntled customer.
With the mono recording/overtalk scenario, the resulting transcription will be flawed. There is no way for the speech-to-text engine to discern what was said during the portions of the call when both the agent and the customer spoke simultaneously. This can result in a lost customer. This highly problematic as it can cost upwards of 15X more to acquire a new customer than keep an existing one.
Many call recording systems only offer mono recording. Is your call recorder dual channel or mono? If it's mono, give our dual channel/stereo recording solution a try for 30 days at no cost.