site map

 

Thingnamer Banner

June 6, 2007 | Tate Linden
If you attempt to make any comments on our blog in the future you'll note that we've added a CAPTCHA plug-in that will ask you to input a couple words before your post is approved.

Normally we find these programs annoying and would avoid them. Sure, it only takes an additional 5 seconds or so - and given that we've had less than a thousand valid comments on our site it would have been less than an hour and a half of time wasted for you readers. The only benefit is that it would save our precious time and effort. We use Akismet - so most of the comment spam doesn't get to us - and the stuff that gets through takes us about 30 seconds a day to eliminate.

So, why are we giving reCAPTCHA a try? Because we love the name and the idea behind the company.

The idea is this (taken from the reCAPTCHA website):
About 60 million CAPTCHAs are solved by humans around the world every day. In each case, roughly ten seconds of human time are being spent. Individually, that's not a lot of time, but in aggregate these little puzzles consume more than 150,000 hours of work each day. What if we could make positive use of this human effort? reCAPTCHA does exactly that by channeling the effort spent solving CAPTCHAs online into "reading" books.

To archive human knowledge and to make information more accessible to the world, multiple projects are currently digitizing physical books that were written before the computer age. The book pages are being photographically scanned, and then, to make them searchable, transformed into text using "Optical Character Recognition" (OCR). The transformation into text is useful because scanning a book produces images, which are difficult to store on small devices, expensive to download, and cannot be searched. The problem is that OCR is not perfect.

...

reCAPTCHA improves the process of digitizing books by sending words that cannot be read by computers to the Web in the form of CAPTCHAs for humans to decipher. More specifically, each word that cannot be read correctly by OCR is placed on an image and used as a CAPTCHA. This is possible because most OCR programs alert you when a word cannot be read correctly.

But if a computer can't read such a CAPTCHA, how does the system know the correct answer to the puzzle? Here's how: Each new word that cannot be read correctly by OCR is given to a user in conjunction with another word for which the answer is already known. The user is then asked to read both words. If they solve the one for which the answer is known, the system assumes their answer is correct for the new one. The system then gives the new image to a number of other people to determine, with higher confidence, whether the original answer was correct.

Currently, we are helping to digitize books from the Internet Archive.
How cool is that? This company is trying to "recapture" 150,000 hours of human labor per day. Of course their product isn't omnipresent, but still - going after that much lost productivity is admirable - and the cause is worthy. Capturing the text of books in the public domain and making them available online is an admirable goal. Thousands (or even millions) of texts can be made available to those without the ability to read or see - the digitized text can be read or translated far more easily when in electronic form.

As for the name itself... It has just a touch of wit to it - since it sounds an awful lot like a New Englander saying "recapture" - and recapturing is exactly what the service does. We are recapturing words that would otherwise be lost to the printed page.

And for those that are interested, CAPTCHA is an acronym/initialism coined at the turn of the century. It means: "Completely Automated Public Turing Test to tell Computers and Humans Apart", and was trademarked by Carnegie Mellon University. (And yes, "CAPTCHA" is a bit of a stretch, isn't it? Shouldn't it be CAPTTTTCAHA? Or Maybe CAPTTCHA? I suppose aesthetics count for something...)

Any other naming blogs (or other blogs...) that are looking for a way to reduce comment spam and make the world a better place... I can't think of a better way to do it than getting reCAPTCHA going on your own site.

Given all this, I think that an apology is no longer warranted for putting a CAPTCHA on our site. Sure, you're taking five seconds longer... but somewhere and sometime there will be someone who hears or reads that word you identified and will be unknowingly appreciative... And isn't that payment enough for your time?

Wiseacres need not answer.
4 Comments
Tate Linden June 6, 2007 9:51 AM

Woohoo! I am officially helping digitize a book! (My words were "helps" and "signing")

Nancy Friedman June 6, 2007 1:20 PM

Welcome back, Tate! We've missed you! I'm going to give reCAPTCHA a try--I got slammed by Italian pornospam last weekend and am currently defending myself with comment moderation. P.S. Thanks for the acronym unscrambling!

Jeffry Pilcher June 7, 2007 5:21 PM

BRB. I'm going to check out Nancy's site...snat

Drew July 10, 2007 12:53 AM

I've also found reCAPTCHA to be a very easy to configure anti-spam solution for my blog. I was concerned that most CAPTCHA are not very accessible, but these guys have added an audio option. Have you had any trouble with users who couldn't make out the words?