Anyone who has a device with virtual assistant AI technology built in like Amazon Alexa, Siri or Google, knows the frustrations of asking your device to do one thing, only to get a completely different and random result! This happened to me this morning when I was getting ready for work. I sometimes start my day by asking Alexa to play one of my music playlists, and this morning instead of getting my Earth, Wind and Fire playlist, I got a story! Of course she asked me to confirm if I wanted to hear the story and I declined. It just was not the way I wanted to start my day; listening to some brooding melodrama while I brushed my teeth. But the offer piqued my interest, and I got to wondering; is Alexa a good narrator?
There are so many conversations happening among Audiobook narrators about AI narrating stories and potentially supplanting us real narrators, that I thought I would assess whether Alexa could actually compete with some of my favorite narrators, and even me! There has been this consensus in the narrator space that while AI for narration is inevitable, we would mostly see it being used for non-fiction stories but in this particular case, Alexa narrated a fictional story, after I did in fact ask her to read me a story after initially declining.
Here is my review of her performance of the title, ‘Camp Blues’, (you can also take a listen to the actual story by clicking on the link below):
Alexa did a pretty good job with the one adolescent boy character voice in the story. She actually brought her pitch down slightly to mimic a 13 year old boy. She didn’t do so good with character distinction however. Alexa used her “natural voice” to mimic both the mom and the counselor in the story who were both women. There was no clear distinction between the two. The same for the protagonist in the story and his bunkmate.
Alexa did a pretty good with narrating the arc of the story but this could have been attributed to the story simply having been written with a clear beginning, middle and end. I feel like that’s an easy one to execute, as long as these elements are present in story. A well-executed narrative is really in the mechanics of writing in my opinion, nonetheless for AI technology to execute that it is impressive.
Alexa did an ok job at narrating some of the transitions in the story, however not all were well executed. There are instances where the story just runs together, and there are no vocal changes or tonal shifts used to indicate a narrative shift. Which leads me to my next observation…
There is absolutely little to no emotion exhibited in the narration! In fact, she sounds quite monotone throughout the story. As I said early, both women, and both boys sound alike. There is a point in which the counselor makes a statement which can be interpreted as indifference towards the boy, and there is no real expression of this indifference. There is a point while voicing the mom, she tries to incite some excitement and anticipation in her narration but it feels like one of those toy rocket launchers where your kid stomps on the plastic yellow pump expecting the rocket to soar 100 feet into the air, but instead it shoots off, sailing for about 5 feet before it lodges in your barberry shrub!
It is very clear in the story that the boy is full of anxiety and nervousness about going to a camp where he knows no one, but you only know this because of the words used to describe his state of emotion, you don’t pick up on this through Alexa’s characterization of him in the story.
Being able to connect with a story, elicit emotion, display some vocal flexibility, variability and range are really key elements in narrating. This emotional intelligence allows the narrator to deliver a performance as varied, unique, and individual as themselves, and even that individual could narrate the same scene or story twice and both takes will never be identical! This is one of the key differences between a human narrator and an AI technology narrator, variation! While writing this blog entry I listened to the same narration from Amazon Alexa more than seven times, and each time it sounded exactly like the first time. No change, no variation, no emotion, just a voice on auto-pilot!
This is not to say that there is not better AI technology out there able to do a better job at storytelling and narration than Alexa however. There are companies that claim their AI technology is sophisticated enough to mimic the nuances of human emotion and vocal variability through narration, like the London based www.deepzen.io , who claim to be able to clone your voice as a narrator, allowing you to produce audio content at scale without ever going into the booth! This article provides some insight into which industries are using AI technology for storytelling and narration, and how.
If I were to rate Amazon Alexa’s narration of Camp Blues I would earnestly give her 3 stars. Her narration still feels a bit mechanical to me, but still manages to include some basic elements of good narration. Does she sound better than me, you be the judge? Take a listen for yourself and let me know your thoughts.