hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thorsten Schuett <schu...@gmail.com>
Subject Re: Reduce Performance
Date Thu, 23 Aug 2007 08:20:16 GMT
I added multi-threading to the map phase of the LocalRunner. The code is in 
the attached patch.

What I also noticed during my experiments is that I have enough load to easily 
fill 8 cores, but my code should be IO-bound. I have the feeling that the 
SequenceFile or the framework wastes cpu cycles somewhere.

Thorsten

On Saturday 18 August 2007, Thorsten Schuett wrote:
> Hi,
>
> first of all, thanks for Hadoop. It's amazing how much you can get done
> with a small hadoop job.
>
> My setup is a little bit different from the usual. I have a mid-sized
> Opteron machine with the data resting on a local raid. I configured
> LocalFileSystem and 2 map + 2 reduce tasks per core.
>
> During the reduce phase I see rather slow copy values in the webinterface
> and <50% cpu usage in total. vmstat shows that hadoop  constantly reads
> ~10-20MB/s and writes in short bursts with higher speeds (>100MB/s).
> Neither the disks nor the cpus seem to be the bottleneck.
>
> What's interesting though, is the traffic on the loopback device. There is
> constant traffic in the same order as the read rate mentioned above. Please
> correct me if I am wrong, but it looks like hadoop is using the rpc
> mechanism to copy the map output files to the reduce task (in this case via
> the loopback device). If my assumptions are correct, would it be possible
> to read/access the files directly in the "one-node mode"?
>
> Thanks,
>   Thorsten

Mime
View raw message