hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steve Loughran <ste...@apache.org>
Subject Re: Question from a Desperate Java Newbie
Date Wed, 15 Dec 2010 17:18:15 GMT
On 10/12/10 09:08, Edward Choi wrote:
> I was wrong. It wasn't because of the "read once free" policy. I tried again with Java
first again and this time it didn't work.
> I looked up google and found the Http Client you mentioned. It is the one provided by
apache, right? I guess I will have to try that one now. Thanks!
> 

httpclient is good, HtmlUnit has a very good client that can simulate
things like a full web browser with cookies, but that may be overkill.

NYT's read once policy uses cookies to verify that you are there for the
first day not logged in, for later days you get 302'd unless you delete
the cookie, so stateful clients are bad.

What you may have been hit by is whatever robot trap they have -if you
generate too much load and don't follow the robots.txt rules they may
detect this and push back


Mime
View raw message