hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <tdunn...@veoh.com>
Subject Re: Bugs in 0.16.0?
Date Mon, 03 Mar 2008 17:20:54 GMT

Hard-coded delays in order to make a protocol work are almost never correct
in the long run.  This isn't a function of real-time or batch, it is simply
a matter of the fact that hard-coded delays don't scale correctly as problem
sizes/durations change.  *Adaptive* delays such a progressive back-off can
work correctly under scale changes, but *fixed* delays are almost never

Delays may work as a band-aid in the short run, but eventually you have to
take the band-aid off.

On 3/3/08 8:46 AM, "Amar Kamat" <amarrk@yahoo-inc.com> wrote:

> HADOOP is not meant for real time applications. Its more or less designed
> for long running applications like crawlers/indexers.
> Amar
> On Mon, 3 Mar 2008, Spiros Papadimitriou wrote:
>> Hi
>> I'd be interested to know if you've tried to use Hadoop for a large number
>> of short jobs.  Perhaps I am missing something, but I've found that the
>> hardcoded Thread.sleep() calls, esp. those for 5 seconds in
>> mapred.ReduceTask (primarily) and mapred.JobClient, cause more of a problem
>> than the 0.3 sec or so that it takes to fire up a JVM.
>> Agreed that for long running jobs that is not a concern, but *if* we'd want
>> to speed things up for shorter running jobs  (say < 1 min) is a goal, then
>> JVM reuse would seem to be a lower priority?  Would doing something about
>> those sleep()s seem worthwhile?

View raw message