Can YOU spot a deepfake? Experts say we struggle to detect AI speech

Could YOU spot a deepfake? Scientists find humans struggle to detect AI speech even when they’ve been trained to look out for it

  • Deepfakes are fake videos or audio clips intended to resemble a real person
  • Study found that people were unable to detect a quarter of AI speech samples  

Humans are unable to detect over a quarter of speech samples generated by AI, researchers have warned.

Deepfakes are fake videos or audio clips intended to resemble a real person’s voice or appearance.

There are growing fears this kind of technology could be used by criminals and fraudsters to scam people out of money.

Now, scientists have discovered people can only tell the difference between real and deepfake speech 73 per cent of the time.

While early deepfake speech may have required thousands of samples of a person’s voice to be able to generate original audio, the latest algorithms can recreate a person’s voice using just a three-second clip of them speaking.

Real or deepfake? Humans are unable to detect over a quarter of speech samples generated by AI, researchers have warned. The video gives nine examples of the AI speech samples that people were played during the study. Scroll down to the bottom of the article to find out which are real and which are deepfakes


The technology behind deepfakes was developed in 2014 by Ian Goodfellow, who was the the director of machine learning at Apple’s Special Projects Group and a leader in the field.

The word stems from the collaboration of the terms ‘deep learning’ and ‘fake,’ and is a form of artificial intelligence.

The system studies a target person in pictures and videos, allowing it to capture multiple angles and mimic their behavior and speech patterns.

The technology gained attention during the election season, as many feared developers would use it to undermine political candidates’ reputations.

A team from University College London used an algorithm to generate 50 deepfake speech samples and played them for 529 participants.

They were only able to identify fake speech around three quarters of the time, which improved only slightly after they received training to recognise aspects of deepfake speech.

Kimberly Mai, first author of the study, said: ‘Our findings confirm that humans are unable to reliably detect deepfake speech, whether or not they have received training to help them spot artificial content.

‘It’s also worth noting that the samples that we used in this study were created with algorithms that are relatively old, which raises the question whether humans would be less able to detect deepfake speech created using the most sophisticated technology available now and in the future.’

Tech firm Apple recently announced software for iPhone and iPad that allows a user to create a copy of their voice using 15 minutes of recordings.

Documented cases of deepfake speech being used by criminals include one 2019 incident where the CEO of a British energy company was convinced to transfer hundreds of thousands of pounds to a false supplier by a deepfake recording of his boss’s voice.

At the end of March, a deepfake photo of Pope Francis wearing an enormous white puffer jacket (left) went viral and fooled thousands into believing it was real.  Social media users also debunked a supposedly AI-generated image of a cat with reptilian black and yellow splotches on its body (right), which had been declared a newly-discovered species

But along with growing fears over the technology it can also be beneficial – for example for those whose speech may be limited or who may lose their voice due to illness.

Professor Lewis Griffin, senior author of the study, said: ‘With generative artificial intelligence technology getting more sophisticated and many of these tools openly available, we’re on the verge of seeing numerous benefits as well as risks.

‘It would be prudent for governments and organisations to develop strategies to deal with abuse of these tools, certainly, but we should also recognise the positive possibilities that are on the horizon.’

The findings were published in the journal Plos One.

Answers: 1 – Real, 2 – Fake, 3 – Fake, 4 – Fake, 5 – Real, 6 – Real, 7 – Real, 8 – Fake, 9 – Fake. 


1. Unnatural eye movement. Eye movements that do not look natural — or a lack of eye movement, such as an absence of blinking — are huge red flags. It’s challenging to replicate the act of blinking in a way that looks natural. It’s also challenging to replicate a real person’s eye movements. That’s because someone’s eyes usually follow the person they’re talking to.

2. Unnatural facial expressions. When something doesn’t look right about a face, it could signal facial morphing. This occurs when one image has been stitched over another.

3. Awkward facial-feature positioning. If someone’s face is pointing one way and their nose is pointing another way, you should be skeptical about the video’s authenticity.

4. A lack of emotion. You also can spot what is known as ‘facial morphing’ or image stitches if someone’s face doesn’t seem to exhibit the emotion that should go along with what they’re supposedly saying.

5. Awkward-looking body or posture. Another sign is if a person’s body shape doesn’t look natural, or there is awkward or inconsistent positioning of head and body. This may be one of the easier inconsistencies to spot, because deepfake technology usually focuses on facial features rather than the whole body.

6. Unnatural body movement or body shape. If someone looks distorted or off when they turn to the side or move their head, or their movements are jerky and disjointed from one frame to the next, you should suspect the video is fake.

7. Unnatural colouring. Abnormal skin tone, discoloration, weird lighting, and misplaced shadows are all signs that what you’re seeing is likely fake.

8. Hair that doesn’t look real. You won’t see frizzy or flyaway hair. Why? Fake images won’t be able to generate these individual characteristics.

9. Teeth that don’t look real. Algorithms may not be able to generate individual teeth, so an absence of outlines of individual teeth could be a clue.

10. Blurring or misalignment. If the edges of images are blurry or visuals are misalign — for example, where someone’s face and neck meet their body — you’ll know that something is amiss.

11. Inconsistent noise or audio. Deepfake creators usually spend more time on the video images rather than the audio. The result can be poor lip-syncing, robotic- sounding voices, strange word pronunciation, digital background noise, or even the absence of audio.

12. Images that look unnatural when slowed down. If you watch a video on a screen that’s larger than your smartphone or have video-editing software that can slow down a video’s playback, you can zoom in and examine images more closely. Zooming in on lips, for example, will help you see if they’re really talking or if it’s bad lip-syncing.

13. Hashtag discrepancies. There’s a cryptographic algorithm that helps video creators show that their videos are authentic. The algorithm is used to insert hashtags at certain places throughout a video. If the hashtags change, then you should suspect video manipulation.

14. Digital fingerprints. Blockchain technology can also create a digital fingerprint for videos. While not foolproof, this blockchain-based verification can help establish a video’s authenticity. Here’s how it works. When a video is created, the content is registered to a ledger that can’t be changed. This technology can help prove the authenticity of a video.

15. Reverse image searches. A search for an original image, or a reverse image search with the help of a computer, can unearth similar videos online to help determine if an image, audio, or video has been altered in any way. While reverse video search technology is not publicly available yet, investing in a tool like this could be helpful.


Source: Read Full Article