Picture my voice
Close your eyes and try to think about a voice that you have fresh in memory, but that you don’t have a face attached to. Someone you’ve never met, but maybe heard in a podcast. Now, play that voice back inside your head and while doing so, try to create a mental image of what that person looks like. Chances are you’ll be recognizing if it’s a male or a female, make an approximation of his or her age and maybe even skin or hair color. Maybe some prominent feature that this particular voice makes you think about. He’s bald, she has a big nose etc.
Now is there any truth to this mental image or is it just something that your mind has cooked up? If there is, how close to reality can we get if we know what to look for? And what if the voice belongs to a brand or a voice assistant, what would that look like?
Our voice is an incredible instrument, capable of producing a huge array of sound frequencies and intricate combinations. From clicks to speech to singing. And while most of us can control many aspects of our voice, we are also bound by our anatomy.
The sound is shaped in our “voice box”. Our lungs produce an air flow and air pressure to vibrate our vocal cords and thereby create sound. The muscles of the larynx are used to stretch and adjust the length and tension of the cord to tune the pitch. Before the sound is released out into the air, our tongue, lips and cheek add filters to further shape the sound.
Voice to face matching
Because of the mechanics of our voice — personality traits such as gender, age, the shape of the mouth — will affect the sound. Longer cords are capable of producing lower pitch, stronger lungs can sustain a note for a longer time etc. We have learnt to decipher these variations in sound into understandable words and expressions of emotion, but we’ve also been trained to understand who they belong to. Just by listening.
A study from 2016 showed a few interesting correlations; Height: taller people usually have lower voices; Masculinity/femininity: a more “feminine” look will often result in a more “feminine voice”; Health: people who looked healthy also sounded healthy.
We can usually distinguish a male from a female (or maybe not anymore as this voice project illustrates) an 80-year-old from a 20-something. But with more features equal, it becomes increasingly difficult. This is where machine learning can play an important role.
Advancements in machine learning
While older studies rely on manual work with questionnaires; the rise of ML has kick-started automated research. Convolutional Neural Networks (CNN) can be trained with real life material from interviews, presentations and instruction videos to find correlations between voices and pictures of faces. Publicly available datasets with some 7000 identities and over 1 million utterances, has given research teams new fuel for studies.
One such study show that the machines are capable of matching human performance in easy scenarios like different gender, nationality and different age, but exceed human performance in more complex scenarios eg. where gender, nationality and age are similar. In other words; the computers outperforms us in finding a face to a voice.
In a recent paper researchers describe how they train a CNN with the dataset mentioned above and then let the computer generate a picture of what that speaker could look like. Compared to the actual speaker, the results are impressive.
Test your skills
Try for yourself on the link below to see if you can accurately match a voice to a face. Listen to the voice and select which face you think belong to the voice. We will summarize your results for you.
What does Siri look like?
With the rise of voice assistants and podcasts, voice character and expression has become a thing for brands to care about. The previously robot-sounding voice has become more personal and as such is now representing a brand and a service in a much deeper way than what a voice over for a TV commercial ever did.
We can expect brands to carefully consider the choice of their representative, voice “persona” to best represent the brand and brand position. Male or female? Age? Timbre? Most assistants are (relatively) young females, like the Google Assistant, Siri, Bixby (Samsung) or Alexa. These personas are definitely building on their personality traits in the way they express themselves (snotty, humorous) but also in the way they sound. Interestingly, these brands don’t offer a picture of their servants. It’s in the “ear” of the beholder. Still, given what we’ve learnt above, most people would have a similar mental image of what they look like?
More than anything, these images offer an interesting look into the human psyche. It shows — not unexpectedly — the common traits described earlier. However, as it’s an artistic representation, there’s a lot of room for interpretation. Maybe that’s why these brands don’t offer a picture to the voice: because the fantasy of the persona is a stronger brand builder than having the real image? With their voices they’re vaguely depicting a smart, healthy, cheerful, calm, dependent (as long as there’s data service) warm woman. It’s up to you to fill in the rest and make it part of your life. We can fall in love with a voice.
However, from a branding perspective, the more neutral and undefined we make the persona, the less it will stand out and the harder it will be to take a position against the competition. Was that Apple? or Google? or Samsung? or Facebook? Who is talking to me?
Last year, Google announced that the Google assistant can be (sound like) John Legend. A real person, with a real face. Maybe that offers some clue into their strategic thinking, but we’re still just at the beginning.
What we know for sure is that with further advancements in AI and ML, sound and voice will not be left untouched and it is up to brands to use this value to build better experiences, brands and future products.
Plan8, a design agency creating music and sound for brands, products and experiences. Plan8 operates in the intersection of audio, emerging technologies, brands and art. Our team of consists of music and sound designers, technologists and audio strategists. Founded 2008. We have offices in Stockholm and LA and work globally. We currently tune the future together with clients like Google, IKEA, Bird and Spotify.
Follow our work on IG: @plan8music WEB: www.plan8.se