hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thorsten Schuett <schu...@zib.de>
Subject Re: Reduce Performance
Date Fri, 24 Aug 2007 07:59:02 GMT
On Thursday 23 August 2007, Doug Cutting wrote:
> Thorsten Schuett wrote:
> > During the copy phase of reduce, the cpu load was very low and vmstat
> > showed constant reads from the disk at ~15MB/s and bursty writes. At the
> > same time, data was sent over the loopback device at ~15MB/s. I don't see
> > what else could limit the performance here. The disk can certainly
> > provide the data at higher speeds.
>
> It can if the reads are sequential, but might not if they're random.
> That said, there could well be a Hadoop bottleneck here, but I still
> doubt that it is the loopback device, which is surely capable of greater
> than 15MB/s, no?
To me it looks like as if the copy operation reduces/limits my reduce 
performance. But we can probably agree that it is not a good idea to copy 
files around when running in a single node, especially when using http for 
copying.

Thorsten

Mime
View raw message