A suggestion on how to mitigate Cross Audio Request Forgery (CARF)

When the big tech players decided to create voice activated assistants (Siri/Alexa/Ok Google/Cortana/<insert contender here>) I think they were probably cognisant that attackers would use this opportunity to conduct the audio equivalent of a Cross-Site-Request-Forgery. If they were, why don’t we have many protection mechanisms available for this?

Current Protections

Since Android M, “Ok Google” has defaulted to use voice training to establish a profile of your voice and authenticate audio request, so someone else’s voice would not trigger the request. This mechanism is not bulletproof as was shown by these security researchers.

However based on this mind-blowing talk from Adobe, this type of protection will not be holding water for long, as the technology is now available to synthesize someone saying anything, providing you have a 20 minute recording of their voice, and with the correlation strong correlation of “fame” to “audio recordings available online” from a targeted attack perspective this will work well.

Also, something I haven’t confirmed is whether Google’s vocal profiling only covers the trigger phrase “Ok, Google”, hence exposing it to vocal replay attacks. Or whether it authenticates all audio it hears. Still the Adobe approach above could bypass this protection.

My Suggestions

Of the two “advanced” approaches I prefer the time-based, as there is no state for each device to maintain. In the event based approach imagine you say “Susan, find the nearest…” but the Alexa device did not hear that correctly. If your watch has now incremented it’s index, you will be out of sync on the names. (note a tolerance threshold could be used to solve this)

Thanks to the independence of state management in the time based approach this also accommodates multiple users right out of the gate, and you could even use different seeds for each user. Though there is no notion of access granularity for voice yet, so this would not be great in a byzantine usage, for example an office environment.

Something I overlooked? Other protections already available? Comment below :D