Proof of work - an unobtrusive alternative to captchas

JoachimSchipper · on May 6, 2011

Proof-of-work systems are cool, but this doesn't work, economically.

A proof-of-work system requires a verifiable amount of computation time before you can do something. The idea is that "honest" computers are mostly idle, but that this should slow down spammers.

The problem is that it may not be possible to seriously inconvenience spammers without seriously inconveniencing honest clients, even if both run native code (e.g. hashcash). The article limits the honest guy to some crappy smartphone's interpreted Javascript running an algorithm that is very much optimized for 32-bit machine words (instead of Javascript's floats). If the smartphone can do the work in 10sec (which is annoying enough), the spammer should be able to do it in at most .01sec: enough to send ~8.6 million messages per day.

latch · on May 6, 2011

I think your statement is too broad. I agree that it wouldn't work on the web, for the reasons that you state. But it is a useful approach in a lot of other situations. For example, I've seen it used successfully in gaming to prevent DOS attacks.

Client Puzzles have a couple nice properties: the difficulty can be adjusted & they can be implemented in a completely stateless manner. The last point is the most important because it what protects against a [D]DOS attack.

jethroalias97 · on May 6, 2011

I agree they are cool, but why not just use a similar solution that does the same thing. Require a couple second server wait time in repeated requests from the same IP. Even if the spammers had over 1000 IPs (and most don't) this would still kill the speed with which they can mass request. Doing this would allow slow machines to get the same wait time as everyone else.

joakin · on May 6, 2011

Messing around with your clients performance doesnt seem like a nice alternative to a captcha for me...

There is plenty of people with bloated windows machines that even using a modern browser struggle to perform properly, I dont think overloading them with extra work would help making their experience easy and smooth.

Just my 2 cents.

jodoherty · on May 6, 2011

I wouldn't necessarily suggest it, but imagine if this were implemented at the protocol level and handled with native code built into clients.

There could be a new HTTP request method called "SUBMIT" which works like a POST request except with an added step for proof of work. When making a SUBMIT request, the client opens a connection to the server, sends the initial request header with the SUBMIT method, and receives a proof of work challenge as an HTTP field in the server's response. The client then processes the challenge and sends its proof of work to the server which verifies it and sends an OK. Then the client sends its url-encoded data to the server, the server sends its response data, and the connection closes.

Then performance impact and behavior would be fairly uniform for all clients, especially if a standard library of problems and problem solving code were used by them, so you wouldn't have to worry about finding a problem that isn't too hard for poor Javascript implementations to solve while still being too easy for native code implementations in spamming tools to process. There could even be an added field for the server to inform the client what problem type is being used, allowing the addition of new problem types of different difficulty and performance requirements in the future without any more updates to the protocol.

Personally though, I don't like proof of work solutions. Sure, they add some computational cost to spamming, but the mass implementation and use of any proof of work solution would probably just create a new, lucrative market for botnet hackers to sell their services to. Besides that, it just feels like the wrong solution to me.

dexen · on May 6, 2011

This strangely reminds me of how Bitcoin works under the hood -- basically, a proof of work. Anybody up to creating a Bitcoin plugin for browsers as an alternative to captchas? Either as .xpi or a Flash file to be included on the website?

derefr · on May 6, 2011

Indeed; these are basically Bitcoin microtransactions. I always thought an "if you really believe this deserves to be ranked higher, invest $0.01 in its prospective well-being" model would be good to base a social news website upon, but real microtransactions have always been halted by the static friction people feel when parting with even a single cent of Real Money. Paying in small amounts of CPU time (that have worth, but don't feel like they do) could get around that.

(Note that, even if this were implemented, you probably wouldn't be doing the work in the browser, but rather just spending (fractions of) coins you had already generated and placed into your bitcoin wallet. Thus, this wouldn't slow down spammers—as they could just buy coins on the open market—it would just make the marginal cost of a single spam transaction more (and hopefully prohibitively) expensive. On the bright side, even if it happened that Bitcoins were too comparatively-devalued to prevent spam, spammers would then, by buying Bitcoins en masse, become the de-facto Bitcoin-to-USD exchange!)

JoachimSchipper · on May 6, 2011

That won't really work. See http://news.ycombinator.com/item?id=2407149 (summary: generating bitcoins takes too long and both client and server would have to trust each other.)

jsn · on May 6, 2011

It will work, but the scheme has to be simplified some. Coins can be generated ahead of time, and then the client just pays a 0.001 bitcoin (via browser plugin, automatically) to the server instead of jumping through the captcha hoops. No trust is required, it's just a usual bitcoin payment.

limmeau · on May 6, 2011

I thought a bitcoin transaction was only considered successful after a sizable number of other bitcoin participants have based their successful computations on it?

If that is correct, bitcoin validation isn't fast enough for instantaneous forum postings when the spammers can be assumed to be as dishonest as fits them.

jsn · on May 6, 2011

IIRC, you can almost instantly verify that certain bitcoin wasn't spent someplace else before, which would be good enough. Now, the attacker can use the same bitcoin for several web forms simultaneously, but 1) the scope of this attack is limited by the time window, and 2) if the server discovers a fraudulent transaction in, say, a few minutes, the spammy comment can be automatically deleted then.

tomgallard · on May 6, 2011

This wouldn't work would it? If the server sends the client a number (n), surely, I (as a spammer), could simply pre-calculate the largest integer (p) where the hash starts with 0000. The server cannot verify this is the wrong answer, without calculating all the hashes between n and p.

So I can send back p as my answer for all values of n, without needing to do any client side calculation

dchest · on May 6, 2011

You're right, the server needs to send additional unique information, as in http://en.wikipedia.org/wiki/Hashcash

    Server:

    uniq_id = long random string
    salt = long random string
    work_number = random integer
    
    storage[uniq_id] = salt
    send(uniq_id, salt, work_number)


    Client: 

    new_number = work_number

    while hash[:4] != "0000"
      hash = hashfunc(salt | new_number++)

    send(uniq_id, new_number)
    

    Server:
    
    salt = storage[uniq_id]
    if no salt
         deny

    delete from storage[uniq_id]

    if hashfunc(salt | new_number)[:4] != "0000" 
         deny
    else
         allow

May also add timestamps.

yuvadam · on May 6, 2011

Proof of work surely is a very interesting concept, but I'm not sure it's preferable to other methods.

As far as I'm concerned using up 8 seconds of user computation time (during which you cannot guarantee responsiveness) is just as bad as throwing a cryptic CAPTCHA field in her face.

Methods such as timestamping, honeypots and dynamically added fields yield very good results for bot recognition.

VMG · on May 6, 2011

Can you expand on the timestamping method? I thought this could be easily circumvented by adding random delays to the bot requests.

bsaunder · on May 6, 2011

See also:

http://news.ycombinator.com/item?id=2368486

(there are lots of comments, a couple of months ago, it would have been easy to see which ones are "interesting", now you must read them all).

yuvadam · on May 6, 2011

You are absolutely correct. Timestamping alone does not suffice, but it is another layer of defense which does work. Most bots do not have random delays.

I'll try to look up some numbers on this.

rb2k_ · on May 6, 2011

Isn't that also what "hashcash" is all about?

http://www.hashcash.org/

http://drupal.org/project/hashcash

http://wordpress.org/extend/plugins/wp-hashcash/

mike_esspe · on May 6, 2011

Is it possible to use in javascript a memory-bound proof of work (http://www.google.com/search?q=memory+bound+proof+of+work), where the computation speed is bound by memory latency, not CPU speed?

squirrel · on May 6, 2011

If we could make it work (and I recognise the problems other commenters have noted with this particular implementation) then this would be a big benefit for blind people, for whom CAPTCHAs aren't just inconvenient but impossible. I'm sure they would gladly accept 8 seconds of unresponsiveness while the computer did some verifiable work, if it meant they could pass the test and use the service.

dchest · on May 6, 2011

Audio CAPTCHAs solve this problem.

squirrel · on May 8, 2011

Not entirely - what about deaf and blind people, or those with cognitive disabilities?

troels · on May 6, 2011

If the idea is to create a bottleneck, then why not throttle login at the server? Eg. have mutex for logging in, only allow one request at a time and then slow that down with a slight wait (Say, 1 second). You'll have to balance the exact throttle with the number of legitimate users on the system, and you'd probably want a timeout-and-retry mechanism in place.

aw3c2 · on May 6, 2011

I love the idea for login systems that exponentially (or somewhat exponentially) prolong the waiting period. Eg 1 second, 2, 4, 8 and so on. There should be a reasonable maximum of course, say 5 minutes. That would leave 300 attempts per day which renders brute force useless yet is low enough to not annoy users too much if you explain the motive.

troels · on May 6, 2011

Yes, that's probably a better solution.

JoachimSchipper · on May 6, 2011

The system you propose doesn't work: either you don't lock anyone out, or you lock out a similar percentage of legitimate users and spammers.

Imposing a separate delay on each attempt (but with no global mutex) is actually done in practice; see "greylisting".

troels · on May 6, 2011

"The system you propose doesn't work: either you don't lock anyone out, or you lock out a similar percentage of legitimate users and spammers"

How so? Requests are piled in a queue, waiting for the mutex. If the request can't be completed for a given timeout (Say, 3 times the delay), you'll respond with an error message and the client will retry his request. You could automate the retry, so the user will only experience that it takes time - not a failure.

The biggest problem I see here is that it's a clear DoS attack vector.

JoachimSchipper · on May 6, 2011

If you set your logins/hour limit higher than the actual logins/hour, your scheme does nothing except smooth some peaks by forcing people to wait. If you set it lower, there will always be (a growing number of) people waiting to log in. In either case, you annoy real users and spammers to the same degree.

CognitiveLens · on May 6, 2011

I'm sure someone has thought about this before, but instead of having the client do work for a certain amount of time, could you just disable submit, set a javascript timeout for some set amount of time, then enable submit after that?

Why does the client need to be processing during the waiting time? Would spambots wait around for the submit button to work?

Joakal · on May 6, 2011

This works well with big forms due to the time it takes to carry it out so it would complete by then. That is, if the user isn't put off by the size of the form. Small forms not so much and the user would be annoyed.

It also penalises users who don't use Javascript.

Edit: This fails against a botnet where computers can be told to wait(10) [seconds] after filling form out.

rflrob · on May 6, 2011

I wonder if there's any way to use this scheme as a distributed computing tool. Maybe give out 3 work-units, one or two of which you know the result for, and the 3rd is unknown. The two known answers could be used to do the check, and the third actually does the work (and provides more known answers for later).

Jabbles · on May 6, 2011

Don't trust client input!

The random number would have to be sent from the server, else the same hash could be used every time.

Also MD5 can be easily done on an FPGA at hundreds of millions of tests a second. Other algorithms might be more effective, particularly those which attempt to use large amounts of memory.

Nice idea though.

underwater · on May 6, 2011

Not necessarily. You store which random number you sent to the client in the session on the server. Once it's been solved you invalidate it.

tomstuart · on May 6, 2011

How do you know that the response really is the smallest such integer, rather than a precomputed one (e.g. the largest int with a hash beginning "0000")?

JoachimSchipper · on May 6, 2011

You don't (have to) care about "the smallest", just "any integer with a hash beginning with 0000".

tomgallard · on May 6, 2011

But once the spamming machine has worked that out once, they can submit the same integer every time.

It would be better if the server sent a request saying send me an integer which has a hash beginning with npqr where n, p q and r are all random characters.

colanderman · on May 6, 2011

Isn't this what botnets are for?

I'm serious - spammers already use packs of porn-hungry humans to process CAPTCHAs; this will only make their lives easier as they already have the botnets available, and botnets don't need porn!

ars · on May 6, 2011

8 seconds is not unobtrusive.

But maybe browsers could implement something natively.

OliverSteenbuck · on May 6, 2011

Isnt't the idea of a captcha that only a human can solve it ?

This sounds like it could just (handwaving here) be given to a bottnet for solving.

keen · on May 6, 2011

Yeah, that's the idea of CAPTCHAs. In reality they can be solved by bots, given enough computing power and several attempts.