hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eric Baldeschwieler <eri...@yahoo-inc.com>
Subject Re: Reduce Performance
Date Thu, 23 Aug 2007 00:04:45 GMT
+1

On Aug 22, 2007, at 11:23 AM, Doug Cutting wrote:

> Thorsten Schuett wrote:
> > In my case, it looks as if the loopback device is the bottleneck. So
> > increasing the number of tasks won't help.
>
> Hmm.  I have trouble believing that the loopback device is actually  
> the
> bottleneck.  What makes you think that it is?
>
> To better support standalone use of Hadoop on multicore boxes, perhaps
> we should promote the MiniMR cluster code from test into the core.   
> This
> runs the tasktracker and jobtracker in the same process.  It still  
> forks
> processes for tasks, and has all the features of a grid setup: web ui,
> task restarting, etc.
>
> I don't think we should spend much effort adding multi-threading to
> LocalRunner, since it lacks so many of the other features of
> TaskTracker/JobTracker.  We should also avoid re-implementing those
> features.  Thus running TaskTracker and JobTracker in the same JVM  
> seems
> like a good strategy for multicore support.
>
> If performance with a MiniMR cluster is not good, then we should
> determine why.  We could, e.g., benchmark and profile sort performance
> in this configuration.  Again, I have a hard time believing that
> loopback bandwidth is a bottleneck.  If it is, then perhaps we can
> optimize around it, but let's first be sure that's the case.
>
> Note that, when running standalone, even with TaskTracker and
> JobTracker, one need not use HDFS.  Direct access to the local
> filesystem will probably be considerably faster.
>
> Doug
>


Mime
View raw message