Saturday, January 29, 2005

Who's This Guy Bayes Anyway?

At the SPAM working group, we spent a few minutes discussing the difference between a Turing Test and a Baysian filter. Here are some background articles that might be helpful.

The Quest for Meaning
In 2000, Wired Magazine ran a great article on the efforts of a UK company called Autonomy and how they were using the ideas of Reverend Thomas Bayes to build a next generation tool to extract meaning from unstructured text data. It's a good article.

The Turing Test

Snipped from The Turing Test was introduced by Alan M. Turing (1912-1954) as "the imitation game" in his 1950 article (now available online) Computing Machinery and Intelligence (Mind, Vol. 59, No. 236, pp. 433-460) which he so boldly began by the following sentence:

I propose to consider the question "Can machines think?" This should begin with definitions of the meaning of the terms "machine" and "think."

Turing Test is meant to determine if a computer program has intelligence.

Many Spam Filters use a challenge response system to weed out automatically generated email messages. They respond with a challenge that can only be met by a human, thus defeating automated email programs that generate Spam.


Anonymous said...

I'm a great fan of the site, which describes in some detail the way these "defeat the machines" work - there is mention of Turing there. It's really the opposite of Turing - not trying to find a computer that has human intelligence, but trying to block a computer by the fact that it does not. Captcha is a "completely automated process for telling computers and humans apart". Apparently machines can break some of them at a rate of nearly 80% - which is not to say that they could pass a real Turing test. There are also audio captchas for people who can't see well enough to do the visual versions...

John Gregory

Hank J said...

The item that kicked off this discusion was a reference by Elizabeth Bowles to heuristic e-mail filters. I was asked to define them and gave a definition that was really not right since it mixed up "heuristic" and "Baysian." My bad! check out the excellent discussion of heurtic at

Anonymous said...

Unfortunatly hackers figured out that they could make pron sites and then in strip per answer fashion get losers from trailer parks to answer the CAPTCHA questions for them and thereby proved they could dodge any defense so coded if they want to bad enough :-(

Time for the goodguy side to try again