I’m Not a Robot: The Rise and Fall of CAPTCHA

Ansh Sehgal
4 min readNov 15, 2020
Google’s reCAPTCHA v2

Captcha tests have been around for nearly 20 years now, and they have evolved from simple text based captchas to complex Artificial Intelligence powered tests. One thing remains constant — the frustrating experience of having to actually do one.

CAPTCHA stands for Completely Automated Public Turing Test to tell Computers and Humans Apart. As the name suggests, it is designed to filter out bots from accessing certain information on the internet. Unlike a Turing test which is designed by humans to evaluate whether machines can think like a human, a captcha is a test that is presented by computer systems to evaluate whether humans are behaving as computer systems (i.e. bots). The idea behind it is promising but the execution of these tests has left a lot to be desired.

Google reCAPTCHA v1

The most popular form of captcha is the Google reCAPTCHA. Google acquired the technology in 2009 for an amount between $10m and $100m in hopes of making it the de facto solution for captcha needs. You might be wondering why Google paid so much money for this technology when other captcha solutions existed that also presented similar text based challenges.

At the time of this deal, Google was working on their Optical Character Recognition (OCR) technology. OCR turns images and various document types into editable and searchable data. Google’s reCAPTCHA used actual clippings from old newspapers and literature then modified them to be presented as a challenge. They present 2 words: a word that is known to their database, and a word is unknown. The human user enters both words into the captcha and reCAPTCHA can double check the word that is known, and learn an additional word.

This is where the cracks start to show — the benefit to Google’s OCR technology puts the security of the system in the passenger seat.

“There is always a battle between usability and security,” says Nan Jiang, a Human Computer Interaction lecturer at Bournemouth University. The reCAPTCHA was designed from the ground up to provide data for OCR. What they did not anticipate was their own technology being used against them.

An artificial intelligence startup, Vicarious, revealed that they had developed a system to automatically solve captchas from several providers (Google included) using deep learning (processing data to find patterns and improve the model, similar to how a brain works) and advanced optical recognition algorithms. They reported upwards of 90% accuracy with their algorithm, making reCAPTCHA and similar offerings extremely vulnerable.

Example of reCAPTCHA v2

In 2012 Google introduced reCAPTCHA v2 with image based puzzles which required the user to identify images that contained a given object. The reason for this change was to introduce a more accessible test that allowed non native English speakers to solve captchas more reliably as well as to make it more difficult for bots to solve these challenges.

These images were taken from Google Maps’ StreetView database and once again Google designed the captcha challenge to improve their image detection algorithms by using completed captcha results to identify common objects like traffic lights, fire hydrants, cars, and more.

Cybersecurity researchers were able to again use OCR and artificial intelligence to break reCAPTCHA v2. In a study published in a popular cybersecurity conference, Black Hat Asia, the researchers outlined that their program could solve captchas at a success rate of upwards of 60%.

Example of Captcha Breaker Algorithms Identifying Contents of an Image

When the most popular captcha solution is consistently getting defeated by the same technologies, it brings into spotlight the main factor behind the failure of captcha systems to stop bots: the lack of focus on security. Google has split their focus on devising a captcha to support their sister technologies by acting as a public data gathering tool rather than a defense against bots.

There is now an opportunity for a new competitor to come up with an effective solution now that bots are running rampant on the internet. There are several cybersecurity firms such as DataDome, Imperva, Akamai, and more that are working on better solutions to address this problem. However, this is also an opportunity for you, the reader, to get involved.

If you’re interested in cybersecurity, you could explore different ways in how you would address this issue. Many firms offer bounties for leads or solutions to security problems. Hopefully, a better and more effective solution will be developed in the near future that can put an end to malicious bots and make the internet a safer place.

--

--