In recent weeks, we have had a handful of clients see an increase in spam. It is coming in through the contact forms on their websites. Often, this was the result of a site we inherited from another developer or host. The website had forms not secured with a captcha. Yet, the recent increase is occurring on sites that have captchas in place. What is going on? Well, first a little background might be helpful.

What is a captcha?

Captcha's are that strange field at the bottom of many website forms (contact us, registration, etc.). The field asks you to prove that you are not a robot. Sometimes completing the captcha is as simple as checking the "I am not a robot" check box. Other times it requires getting a grid of nine pictures. You have to identify all the pictures of a fire hydrant or cross walk or bus or truck or... you get the idea.

When captchas were first developed they were often simple math problems or squiggly letters. In recent years, Google's reCaptcha has been the dominant offering on websites and for several good reasons:

  • reCaptcha is easy for developers to setup
  • The "problems" are easy for end users to solve. Sometimes those old squiggly letter captchas were really hard to read.
  • The service is free to use (although this is changing for enterprise users)
  • It is effective. Google's programming did a good job of allowing humans to finish and submit online forms. They also prevented bots from being able to do so.
  • Google's resources provide consistent development.

Version 3 of their captcha offering is "invisible reCaptcha". Google is using data about how a user is interacting with a page to determine if they are a bot or a human. Things like, "how fast did you fill out the form?" No check boxes needed. Sometimes reCaptcha isn't clear about whether it has a bot or a human. The invisible captcha falls back on the old strategy of humans identifying pictures.

Why do we need captchas?

The purpose of the captcha for site owners is to cut down or eliminate the amount of spam that comes into their contact forms. Spammers write programs (bots) that try to find and fill out online forms.

Why do spammers bother trying to fill out forms?

Most of the time, spammers are trying to improve search engine performance for their websites. Incoming links are an important part of Search Engine Optimization (SEO) strategy. Search engines like Google, Bing, or Duck Duck Go evaluate how important a website is, based partly on how many other websites are linking to it. So if a spammer can get a link from your website to theirs it increases the odds that their website will show up earlier in search results. The way they hope to get that link is by filling out a form on your website. Many times registration forms let you include a link to your homepage. Even better, comment forms will post information entered directly to your website. Those comments include links to the spammers website and ta-da you are helping their SEO.

If this seems shady, it is. Spammers are trying to use the good and legitimate reputation of your business or organization to trick search engines. They want the better reputation to use it to get better search visibility. Spammers that engage in this behavior are often trying to promote websites from less reputable industries. This includes things like gambling, and men's (ahem) health and (ahem) entertainment. You can understand why we don't want these search terms on our site. Those industries are not exactly our target market. They are not yours either, which is why it is important to keep their spam off your site.

Why separate the bots from the humans?

Filling out forms is boring. A bot is just a computer program designed to automatically fill out forms. It is VERY inefficient for spammers to fill out forms manually. So, they write these programs, or bots, to do it for them. The bots aren't as smart as a human. So, if we have some kind of test like a captcha we can pretty much eliminate any bots from posting spam on our websites. The problems is that there is a financial incentive for spammers to create smarter bots. Over time they get better at filling out those captchas. So there is a bit of an arms race between website developers and spammers. Website developers want easy-to-deploy, low-cost, effective captchas. Spammers want to get links pointing to their website on to yours.

Why is Google involved with captchas?

There are a number of reasons Google is interested in all this. The first is that they want to provide the best possible search results. If they get tricked into providing low quality answers, people might start to look for better search engines.
A second, less obvious reason is that every site that has a Google reCaptcha installed is providing information back to Google. They are tracking the IP address of every site visitor. That address connects to their huge store of other data. That includes what other sites you have visited, when you visit them, and where you are visiting from.

Also, all that identifying pictures is more than just interesting puzzles. The visitors to your site are also working for Google every time they complete a captcha. Your visitors are helping Google's computers to learn what a fire hydrant or a bus or a crosswalk look like.

If everything is working properly you as a website owner get less spam from your forms. Google gets better search results, more data about your site users, and free work in classifying images.

What is hCaptcha and why would you use it?

hcaptchaTo the extent that Google has become a little less effective at filtering out the bots, it is helpful to entertain alternatives. Google will make improvements to keep its position as the industry leader. Our data is completely anecdotal. We never used to get complaints from our clients that were using reCaptcha on their forms. Now we sometimes do.

hCaptcha is a new entry in the spam blocking space. It has several advantages. The company behind it is much smaller than Google. That means it is less beneficial for spammers to program their bots to defeat it. Not enough sites use it yet to make it worthwhile. This could change over time, but right now it is an advantage.
hCaptcha is owned and operated by Intuition Machines. They do not have the have the huge data sets that Google does. They can not cross link your captchas to analytics, mobile apps, maps, search history, etc. the way Google can.

hCaptcha does still use the work of image classification. They are much more direct about this on the home page of their website. They re-sell the image classification services to other industries. They do it without attaching all the personal information of your website visitors.

There is a small incentive provided by hCaptcha for the image classification work that is done on your website. They provide credit in the form of their own cryptocurrency called Human Tokens (HMT). You can use these tokens to buy image classification services back from Intuition Machines. Or you can donate the tokens to the Wikimedia Foundation. That is organization behind wikipedia.org. It is easy to see the benefit they derive from being able to classify images.

If hCaptcha is working properly on your site, you as a site owner get less spam on your forms. Your site visitors get more privacy. Intuition Machines gets an image classification service that they can sell at a profit. You can get a small incentive to buy services or donate them to the Wikimedia Foundation.

Must we live forever with captchas?

There are other strategies for reducing spam on website forms. Captchas are simply the most visible. We are also huge fans of Project Honeypot. The details are outside the scope of this discussion, but it deserves an honorable mention.

Project Honeypot is even better than captchas. Its operation is invisible and behind the scenes. We have been using it for over a decade. There is also no data collection or third party profit motive. It is a community of web developers helping each other identify and block bad actors online.

If you are a website owner ask your developer to look into Project Honeypot for your site. If you are a developer, it is a great resource. Your sites can enjoy Project Honeypot protection with very little effort. It is also very easy to give back by setting up some honey pots of your own.

Project Honeypot is great. We use it in conjunction with captchas. Together they provide the best hosting experience for our clients.
As in life offline, there will always be tension. There are those trying to make things better for everyone. There are those taking advantage of others for their own benefit.

As in life offline, there will always be tension. There are those trying to make things better for everyone. There are those taking advantage of others for their own benefit.

InterGen is always seeking the most ethical resources to serve our clients. If you share these values, learn how our approach to web development and hosting can serve your organization.
Contact us today.