Friday, October 05, 2012

The Capcha Question

A commenter recently complained that the capchas, the images that you have to type in when posting a comment in order to show that you are a person and not a program, are too difficult—a problem I have seen elsewhere online. The obvious problem with not having them is that it makes spam comments easier, since they can be done wholesale by software instead of one at a time by a human being.

As an experiment, I turned off the capcha requirement, with mixed results. Judged by the comments sent in—I have blogger set up to email me copies of all comments—spam increased considerably. Judged by the comments that actually appeared, it didn't. Blogger did a sufficiently good job of filtering out spam so that most of the additional ones got eliminated.

There is still a cost, however—if I want to keep track of comments, including comments on old posts, by reading the emailed copies, I have to wade through a lot of spam. And one reason to keep track of them is that blogger sometimes makes mistakes, in both directions. If I never read the posts that it thinks are spam, some real posts get eliminated.

Suggestions welcome—for the moment I'm leaving capchas off.

19 comments:

Rohan said...

On my blog I have set it to moderate comments on posts older than two weeks, with no captchas. So I get email for new comments, including new spam, but must go to the blogger interface to moderate old comments. If I approve an old comment, I get an email of it.

I think this is pretty decent compromise. Comments are totally open on new posts. You can still comment on older posts, but as spam is more likely on old posts, it cuts down on emailed spam.

Anonymous said...

I'm not sure what the right answer is, but I've noticed CAPTCHAs getting harder and harder over the last couple of years. Either my eyesight is going, or the bots are getting closer and closer to human ability to decipher the challenges.

Anonymous said...

A couple months ago, google switched to a new captha algorithm and it was suddenly much, much easier. That lasted for about two weeks, then it got harder again.

I guess they aim for a certain (human) error rate. If it's hard for people, then it's probably not crackable.

Laird said...

I believe it was my comment which triggered this experiment, and I appreciate the consideration. I don't object to some form of Capcha; I recognize its necessity. It's just that I think there are better systems out there, which don't have so many letters and symbols which are essentially illegible even to humans (especially those of us with older eyes!).

Anonymous said...

Automated captcha solvers have indeed been improving, from what I've heard.

Also, some captcha systems (particularly ReCaptcha) may unintentionally generate hard-to-impossible captchas occasionally. Fortunately they usually let you try again if you fail.

Nick said...

I think there are better types of "captchas" than decifering hard to read letters and numbers. Some of them ask you a question like "which direction is opposite of north" or a simple math problem. These seem to be a better way to go.

Kid said...

I think there are better types of "captchas" than decifering hard to read letters and numbers. Some of them ask you a question like "which direction is opposite of north" or a simple math problem. These seem to be a better way to go.

Your naivety is cute. You need a defense system matched to the level of motivation of the spammer. Your family forum needs only to defend against automated spammers that target thousands of forums at a time, so a simple unique question is an effective defense.

Blogger is a very popular blogging platform and attracts highly motivated attackers. It would be trivial to build a database of questions and answers to bypass the simpler system.

Recently, google's audio captcha system was cracked [1][2], prompting google to make it significantly harder.

The captchas have become harder for humans only because automated solves have improved. Perhaps it is still possible to find a captcha that is easy for humans and intractable for computers, but this is an open research problem, not a simple task.

[1] http://www.youtube.com/watch?v=rfgGNsPPAfU [Original presentation, 1 hour video]

[2] http://arstechnica.com/security/2012/05/google-recaptcha-brought-to-its-knees/ [short article about the presentation]

Tom Crispin said...

On the blogs I read, most informative and useful comments come from a relatively small number of repeat visitors.

Maintain a white list. To comment, someone first emails you for permission and once approved you add them to the list. If the privilege is abused you remove them from the list and further comments go to the bit bucket.

SheetWise said...

You could reverse the form element names in the submit and target templates, and respond with a custom ignore page -- "Thank You!" -- when you receive name/mail submissions that are backwards.

The Sanity Inspector said...

Could be worse. There is apparently a crew of spammers targeting The Atlantic, among others. They are real people, posting real, on-topic comments in native English. But mooshed up into the comment is a lik to their online store of counterfeit luxury goods.

Roman said...

As it was said in one of the previous comments, a semi-solution is to only show captchas to those who are not in the white list (first-time commenters).

Expanding this idea, there should exist independent commenting systems who'd build a database of spammers across various websites and provide blog authors with the automated protection. In fact, there is one, it's called Disqus and you can probably install it on Blogger too. My experience using it is that there's virtually no spam on it.

Anonymous said...

I understand that some capchas also allow for scanned documents to be turned into text - so at times, you are not only solving a capcha, but also transcribing part of a scanned document. Talk about crowdsourcing!

Anonymous said...

I want to illustrate that people's understanding of the state of the art of computing is behind the curve.

Regarding the earlier suggestion "`which direction is opposite of north" or a simple math problem,'" I introduce Wolfram Alpha which can already solve it. If you query 'opposite of north' into it, you get unambiguously the following:

Input interpretation:
north (English word) antonym. Result: south.

And it does hard math problems as well. Entering "int sinx/x dx" gives all kinds of solutions like definite integrals, indefinite integral, graphs, power series etc.

Another example:
"population of algeria at jfk assassination." Answer in 3 seconds: 11.2million

OR "high tide in los angeles at full moon next month." Ans: +3.6feet

Mark Bahner said...

Hi,

There are different levels of capchas. Roger Pielke Jr's blog's capchas are virtually impossible.

The Frontier webmail's capchas are fairly easy.

My recommendation would be to try to find capchas that are fairly easy (for a human).

Mark Bahner said...

I'd be interested to know whether computers could answer questions about pictures. Like if there was a smiley face and asked, "What is this person doing?"

Or if one showed a cartoon of cactus and a daisy, without saying what they were, and asked, "Which has sharp needles?"

Or a cartoon of the moon (craters) and sun (rays) and the moon was pink, and asked, "What color is the moon?"

Or showed a triangle, a square, and a circle, and asked, "Which has the fewest sides?"

I don't think computers would be good at that.

DarQ DawG said...

I posted something about using honey pot traps to catch spam bots in the comments under your Economists and Virtual Worlds post.

I'll post it again here:

http://graphiclineweb.wordpress.com/2012/02/26/honeypot-your-blog/

Now I'm not exactly sure how to use it specifically in the comments section, but I'm sure Project Honeypot has information on it.

Obviously, it won't stop human spammers, but neither will CAPTCHA.

I hope this helps.

Robert Wenzel said...

Read your comments in blogger, as opposed to emails, that way you can scroll down them quickly and even delete spam.

Izzy said...

I agree that Captchas are problematic. I am dyslexic and letters dance and wriggle around as it is let alone when they are twisted into a captcha! As for the audio, that beached whale/dalek symphony is almost worse!

Has anyone here tried captcha bypass browser extensions? I have started using one called rumola and it seems really reliable and effective and is certainly solving my captcha woes!

Kid said...

@Mark Bahner One of the simplest ways to bypass captcha systems is to build a database of questions and answers. Such a database is cheap to build and could easily have millions of entries. For this reason, Blogger's captchas are generated by an automated process capable of creating a very large number of unique images. Your suggestion works so long as you only need a small number of different questions. For a high profile target like Blogger it is not an appropriate defense.