Wednesday, November 14, 2007

Hacking Freerice.com: A Program to Feed the World

While I was working on some changes to Twittervision yesterday, I saw someone mention freerice.com, a site where you can go quiz yourself on vocabulary words and help feed the world. How? Each word you get right gives 10 grains of rice to, one hopes, someone who needs it.

The idea is that you will sit there for hours and look at the advertising from the do-gooder multinationals who sponsor it. Which I did for a while. I got up to level 44 or so and got to feeling pretty good about Toshiba and Macy's.

It occurred to me though that my computer could also play this game, and has a much better memory for words than I do. In fact, once it learns something, it always chooses the right answer.

So I wrote a program to play the freerice.com vocabulary game. In parallel. 50 browsers at a time. Sharing what they learn with each other. Cumulatively.

It's a multithreaded Ruby program using WWW::Mechanize and Hpricot. Nothing terribly fancy, but it does learn from each right and wrong answer, and after just a few minutes seems to hit a stride of about 75-80% accuracy. And a rate of about 200,000 grains of rice per hour (depending on the speed of your connection).

UPDATE: With some tuning, the script is now able to push out about 600,000 grains of rice per hour, which according to the statistic of 20,000 grains per person per day, is enough to feed over 720 people per day! If one thousand people run this script, it will (allegedly) generate enough to feed 720,000 people per day.

Before you go off on me, disclaimer: Yes, I realize this program subverts the intent of the freerice.com site. I've released this not to "game" freerice.com but simply to show a flaw in their design and have a little fun at the same time. If what they are after is human interaction, this design doesn't mandate it. That's all I'm saying.

Run it for a while and see how many people you can feed!

Prerequisites:

  • Ruby (Linux, OS X, Other)
  • Rubygems
  • gem install mechanize --include-dependencies


Download the code

38 comments:

Dave Troy said...

New version of the code posted! Being unfamiliar with WWW::Mechanize I wasn't aware of some efficiencies it allowed... much more efficient parsing now.

Enjoy!

Unknown said...

You beat me to it! I realized the same thing when I was struggling to get above level 35 - maybe I won't be able to get above level 40 without a few years on freerice.com, but I bet I could write a program that can.

I was about to start writing a program in Perl using the LWP modules, when I did a Google search and found your blog. I still plan on making my own version though as a challenge.

quangntenemy said...

Sending 50 requests at a time? I think you're abusing it. If 1000 people launch the program simultaneously I think we'll have a DDOS attack :P

Dave Troy said...

quangntenemy: Be realistic. This script doesn't do any more or less than 50-75 simultaneous users can do. In fact, it's tough to distinguish between this script and a comparable number of live users. Are you saying the site can't (or shouldn't) handle the kind of exponential growth it says it is inspiring?

You're merely calling attention to what a ridiculous concept this is. 1) If they can't handle an unlimited number of users, then the site is broken as a concept, 2) If they don't want bot and script clients to game it, they should make a decent attempt to limit them, and disclose their policies to their advertisers and to the public.

So far they have done nothing to limit scripting, presumably because it inflates their numbers to the press and to advertisers.

All they have succeeded in doing so far is making the statement that it is possible to write computer programs to "feed" the hungry. If that is in fact the case, then we should take it to its logical conclusion and quickly dispatch the problem. My script is very efficient.

If this is not the case -- if they don't want a large number of scripted users -- then they should stop dissembling, disclose their finances, and work towards the direct donation of corporate resources to people who need them, rather than acting as an opaque, margin-making middeleman.

Furthermore, it's entirely unclear that the UN WFP, or freerice.com, is going about this in a way that is actually beneficial. Subsidies, whether from corporations or middlemen, always distort markets, and one must seriously question whether there are legitimate food producers who are being harmed by this subsidy. It is also not guaranteed that the rice that is being purchased and donated is produced ethically.

So, I am asking questions. And my initial reaction is this: If they are making it possible to write programs that produce food, let's write and distribute the most effective ones possible, so as to force this question to its logical conclusion. There is no other moral answer.

Anonymous said...

Hi,

Nice script... :) like quanghtenemy, who has a java ricebot (http://www.freewebs.com/quangntenemy/freerice/index.html), i also have a ricebot, in python (http://smokyflavor.wikispaces.com/RiceMaker), so i'm going to join the discussion. ;)

to make a response to your last post: yes, it is rather easy to distinguish between running your script, and running a single-request-per-second script by 50 people - your requests all come from the same IP! :) so, if freerice decides that it wants to block "obvious" bots, your bot is much easier to spot in the logs.

that said, i think you make a credible argument, overall.

my question is more of a technical one - nobody has a 50-core cpu, so what's the point of running 50 threads each with 1 second delay, when you'd achieve the exact same amount of rice per unit of time running 1 thread with 1/50th of a second delay (on a single-core cpu. scale the math accordingly for a 2 or 4 core machine).

[now that i'm curious, i'm gonna time my ricemaker to see how many questions it can do without the enforced loop delay... :)]

quangntenemy said...

I understand your intention. But I just want to remind you that when you try to answer repeatedly from the same session, you'll see an error message saying "We can't process your rice donation that fast" - I think that's because they want to set a delay for every user. You're trying to get around that by using 50 different sessions. I think the FreeRice people wouldn't like it.

Anonymous said...

ok, never mind, i now know why you are doing 50 threads. i just tried running my ricemaker without loop delay, and got a bunch of "sorry, we are unable to process rice donations so fast" messages from freerice.

well, that explains it. :)

Dave Troy said...

Some responses to quangntenemy and others:

1) This script generates each 'browser' with a different user agent. This would appear pretty much identical to many simultaneous users from behind a large firewall or NAT gateway. So, I stand by my assertion that it is difficult to accurately distinguish between this script and its "multiple users" and a large number of users behind a NAT or firewall.

2) Regarding your question about 50-core CPU's, this program uses Ruby and Ruby does not use native threads. So, number of cores is irrelevant in this case; this program would run multithreaded on DOS (yeah, they have Ruby for DOS!) :) Threading is used in this program just as a construct for parallelism which would otherwise be hard to achieve.

3) If the freerice people wouldn't like this, then they should change their site so that it can't be so easily gamed. I've pointed out exactly how the site would need to be fixed to prevent this kind of gaming.

That said, if the site cannot be protected from gaming, then I return to my previous assertion that it's a ridiculous concept, trivializing the economics of a global catastrophe.

Rather than tying up computer bandwidth and CPU solving a non-problem, why don't we put people's energies towards a more efficient solution?

Anonymous said...

well... i was curious to tinker around with threads in python, so i made my ricemaker multithreaded, too. heh. thanks for giving me the impetus to do it. ;)

check it out, if you care to:
http://smokyflavor.wikispaces.com/RiceMaker

i set the default thread count to 10, but it's configurable through cli args.

by the way, you say you are running 50 threads, with 1.1sec delay, and get 650k grains per hour... with those numbers, you /should/ be getting about 1.6 million grains per hour. (3600*50/1.1*10), so, it seems you're running into some processing bottlenecks. you might be able to achieve the same grain throughput with reduced thread count...

Dave Troy said...

smokyflavor -- thanks for the feedback. I am planning to do a post with some benchmarks for various machines and their "rice throughput".

I am using a Mac Pro (quad proc, 2.66Ghz) to get the 650K grains/hour. It's nearly the same on my MacBook Pro, and about half that on my MacBook.

Increasing the threadcount to 75 or 100 doesn't seem to produce measurably more throughput, so you're likely right the ideal number is probably somewhat less than 50, but may vary depending on the machine, etc.

Also would depend on the freerice server configuration, how many children they are running, etc.

What kind of numbers are you seeing from your multithreaded python script?

Anonymous said...

ok, i ran some testing just for fun. :)

loop delay: random(3,6) = average delay 4.5
(i found that at loop delay random(2,5) i still got some "can't process that fast" errors, so bumped it to 3,6)
15 threads
so we should expect 15/4.5 = 3.33 iterations per second (aka 0.299 sec per iter).

when only using internal dict and random as failover, the actual iterations per second i get is 2.95, (1/2.95 - 1/3.33 = 0.038) seconds off theoretical maximum (i.e., min secs per iter), so guess that is the processing time required to download the freerice site, parse it, check the words against internal dict, etc. all this translates to 106200 grains of rice per hour.

all this uses about 30-40% of the cpu, but in a pretty variable-rate way (this is a single-core pentium 4 mobile 3ghz cpu).

when using internal dict, then wordnet, then dict.org, then random as the failover chain, i get about 2.90 iterations per second. similar cpu usage. this translates to 104400 grains of rice per hour. not a huge diff.

now, let's see what happens as we increase thread count...

with 20 threads, the theoretical max becomes (20/4.5 = 4.44) iter/sec, and i get about 3.91. about 50-60% cpu usage. have we reached a cpu bottleneck? (1/3.91 - 1/4.44 = 0.031) seconds off theoretical min per iteration, so it is NOT yet taking more time to do the processing of the site, etc. making about 140760 grains per hour.

let's bump it to 40 threads. theoretical max is 40/4.5 = 8.88 iter/sec. get 90-98% cpu usage. about 5.8 iter/sec. 0.06 sec off the theoretical min per iter, so we are starting to suffer a bit. this rate translates to 208800 grains per second. somewhere between 20 and 40 threads (maybe 30? i'm getting too lazy to test :) ), is where we are just starting to hit the cpu bottleneck and degrading performance per thread.

so... you are doing quite a bit better on your quad-core, but given that you have four cores, you /should/ be getting four times the 208k = 832k grains per hour at least, if you tweak your threads and loop delay. :)

that of course assumes your code's efficiency is the same. since your code is obviously different, and written in a different language, you may do better or worse than that. no guarantees. :)

feel free also to run my python script - i have integrated iter/sec (and rice/sec - though it is redundant since i also have %accuracy reported) info into the output so benchmarking should be easy.

quangntenemy said...

Guys, have you looked at the faq page? Seemed like our bots have been taken into consideration.

Couldn’t I just write a computer program to play all day and give a lot of rice that way?

There are two problems with this. First, it overloads our servers so that real people can’t play and learn vocabulary. Second, without real people playing and eventually buying products, it is no longer cost-effective for companies to advertise. Without advertising, we cannot give any rice at all.

I added the adclicking feature to my bot. Not sure if that will change anything but maybe you should take that into consideration as well.

Anonymous said...

sorry, but could you go thorugh how to get this script running in a bit more detail, such as what ruby things to install and where to start the script and how to start it.

thanks

Anonymous said...

quangntenemy: since your script doesn't "eventually buy products", i am not sure i see how it is any better in the long run than a script that doesn't click ads.

if the ads are pay per product-buy, then clearly impact is 0, even in short term.

if the ads are pay per click, eventually the advertisers will see that given the number of clicks, there are awful few buys, and realize that something is amiss, and either drop the rates, or pull the ads, so in the long term no positive impact from auto-click.

if the ads are pay per view, then clicking doesn't do anything either, and short term 0 impact again.

so no matter what the ad structure is, auto-clicking doesn't seem like a helpful or useful thing to do...

-sf

Anonymous said...

They make something nice, and give it away for free. (I also doubt the economics, but it was fun to play, and presumably educational.)

Then you break it, to prove how clever you are. What have you done that's useful?

Dave Troy said...

It's an interesting intellectual exercise to create this kind of code. It should also serve to remind people that there is very little that can be created on the web that cannot also be gamed.

Therefore, this script creates an impetus for site designers to think carefully about how they can generate the kind of usage they say they desire.

This code is nothing more than a glorified bug report that in the best case might generate rice to feed people. What's wrong with that?

Anonymous said...

I wrote something myself and I linked you on my livejournal:
http://darrenism.livejournal.com/44632.html

Anonymous said...

I am not a computer genius, so could you please explain how this works simply?

Dave Troy said...

Yes Dinah, it's pretty simple.

The program simply visits the site and "reads" it just like a human would. It looks at the choices for the word meanings and then makes a guess. If it is a right guess, it remembers the answer and stores it in a list of right answers. So if it encounters that question again in the future, it will get the answer right the first time, always.

In my program, the program launches 50 simultaneous "players" of the game, but all 50 players use the same list of answers. So it learns right answers 50 at a time and thus is very fast and efficient.

The program also comes with a list of nearly all the answers, so no "learning" is now needed. The virtual players can just play. Not rocket science, just an application of the tools that are out there.

Hope this answers your question!

Anonymous said...

I wrote something similar, though I only used a single thread since I assumed my ip would be banned if I didn't appear somewhat like a human user.

Anyways, I decided to dump out a file that contained all of the advertising hrefs and their into a single file. Once every so often I open it in my browser (I've run across 39 unique advertisements) and I look at them all and read them completely. While I'm still not interested in the products, I've read them, digested them, and, truth be told, paid more attention to them then if they had just been at the bottom of the screen since I think we all just tune out advertisements anymore.

I'm sure there isn't a 1:1 correlation of the grains accrued on the site and how much money is actually donated, but if my clicks help at all or if the higher numbers increases the marketability of the site and brings more users I feel it was worth the time.

Lastly, those interested in click based charity should check out care2.com or google for others for a bit. I've got a script that goes to about 15 difference sites once a day and does 'my part'.

Anonymous said...

Great program...
Only problem, every once in a while it just... Dies.
So, maybe you can fix that.

Dave Troy said...

dark -- I've not seen this behavior. Perhaps the remote end is getting stuck? Can you be more specific ? Maybe do some network sniffing (ngrep port 80)?

There's nothing inherent in the design that would make it at risk for deadlock unless the remote end stops responding.

Anonymous said...

Great program! Only problem it, and ALL programs now crash because after a while, Freerice kills the connection and you need to reconnect.

Anonymous said...

Can anyone explain to me how to install this and set it up to work? I know the main page said what you need, but I can't seem to do anything right. I have Mac os X leopard. I am running this on a 2ghz Macbook Intel core 2 duo processor. I would greatly appreciate this, it seems like a great program. :)

Dave Troy said...

All you need is ruby (comes with OS X) and the mechanize gem (and its dependencies). Just say "sudo gem install mechanize" and it will install everything you need.

My Mac Pro consistently feeds 28-35 people per hour!

Dave

Anonymous said...

Now it opens up, but it won't "start". The go option under the run menu is greyed out. I am not really sure what to do now.

Dave Troy said...

Uh, not sure what environment you're running this in. Open Terminal, cd to the directory where the program is, and be sure you have already run "sudo gem install mechanize". Once you're sure you've done that you can just type "ruby freerice.rb" and it will launch.

You may wish to google for some basics on programming (and possibly on writing ruby programs) so you have some of the fundamentals covered.

Anonymous said...

When I type in sudo /Users/Ryan/Desktop/freerice.rb

I get:

sudo: /Users/Ryan/Desktop/freerice.rb: command not found

It did not say that one time, but nothing else ever came up.

Dave Troy said...

Please re-read my previous post. You need to use "sudo gem install mechanize" and then "ruby freerice.rb".

Anonymous said...

On freerice.com it says that every time you get a word right, it donates 20 grains of rice. On the program, it goes up by 10. Why is this? Should this actually be doubled?

Anonymous said...

Hi Dave,
I've been poring over your code (and the SmokyFlavor Python ricemaker, incidentally) as a case study in www::mechanize use. Are you familiar with the differences between the Ruby implementation, www::mechanize, and Python mechanize.py? For instance, I can't find an equivalent Agent_Aliases attribute in the mechanize.py code.

Dave Troy said...

Hi -- to the two most recent posters.

1) When the program was created the number of grains per correct answer was 10; if it's 20 now that's a simple change.

2) AGENT_ALIASES is a list of potential user agents that WWW::Mechanize can use. My usage of it is to simply create a new array from which a random choice can be made; I remove the 'mechanize' user agent to that the browsers that are launched appear as multiple different types of browser.

I don't know if the python implementation has a comparable structure, but if not exactly the same, it should be possible to set the user agent to any arbitrary value.

Anonymous said...

What do you need to make this work on windows?

Dave Troy said...

It should be possible to make work on Windows with a Ruby + RubyGems implementation for Windows. I don't use Windows so I am not sure, but if you figure it out, feel free to post that info here!

X said...

Hey Dave, wanted to drop you a quick note and thank you for stimulating an interesting discussion. I have been wondering about the possibility of 'gaming' the freerice site to feed more people and appears you have created a definitive answer. Also, it seems you have approached the challenge with curiosity, respect for the overall vision of freerice and not a malicious desire to harm the site. Cheers and kudos, Xander

Anonymous said...

Hmm... Good script. just one thing after running it for a few minuets and getting quite a bit of rice I get a error. The error is /usr/lib/ruby/1.8/timeout.rb:54:in `rbuf_fill': execution expired (Timeout:Error)
from freerice.rb:167:in `join'
from freerice.rb:167

nobody said...

I found the math section of Freerice to be ridiculously easy to automate, no learning required since computers are good at math. I'm not sure if I should run my program too much though. If the advertisements don't load, does freerice still get money to buy rice with? (I'm getting about 10,000 grains per minute)

Anonymous said...

Please understand - while this is a masterful way to promote your knowledge and skill - your efforts might eventually cause the end to the freerice.com website if they are unable to stop you. In the end you could be potentially hurting the starving people of the world. Now what's more important? You growing the number of people running this script to prove that the website is flawed? Or is it more important to feed the hungry through growth of legitimite users of the site? Do the right thing - destroy any bots or scripts and be honest. The makers of Freerice.com should be wasting their money to try to stop this kind of thing. If you want to help the hungry, then do so. But don't risk the loss of a website that actually works to help the hungry.