hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From CubicDesign <cubicdes...@gmail.com>
Subject Re: Processing 10MB files in Hadoop
Date Thu, 26 Nov 2009 20:23:58 GMT

> Are the record processing steps bound by a local machine resource - cpu,
> disk io or other?
Some disk I/O. Not so much compared with the CPU. Basically it is a CPU 
bound. This is why each machine has 16 cores.
> What I often do when I have lots of small files to handle is use the
> NlineInputFormat,
Each file contains a complete/independent set of records. I cannot mix 
the data resulted from processing two different files.

Ok. I think I need to re-explain my problem :)
While running jobs on these small files, the computation time was almost 
5 times longer than expected. It looks like the job was affected by the 
number of map task that I have (100). I don't know which are the best 
parameters in my case (10MB files).

I have zero reduce tasks.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message