hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From cha...@students.iiit.ac.in
Subject Re: taking lot of time in doing map task after 5% completion
Date Wed, 02 Jul 2008 06:10:17 GMT
> On 7/1/08 2:18 PM, "charan@students.iiit.ac.in"
> <charan@students.iiit.ac.in>
> wrote:
>>    We are working on conversion of 1.6 million text data inputs into
>> images , for this we are using hadoop but we are having a problem like
>> it is performing 1% of this job in 4 minutes and 3%-4% in 1 hr ... and
>> it is taking lot of time when it is proceeding to 100% . Is there any
>> thing wrong in my hadoop setup or any other problem . Because it works
>> too fast when i give a input of 1000 or 5000 taking only 23 sec - 1 min
>> 13sec . my created image size will be around 13-30 kilobytes
>
>     It sounds as though you have lots and lots of really small files.
> HDFS
> doesn't perform well under those conditions and will typically send the
> name
> node java process into a garbage collection tail spin.  Try combining the
> data into bigger files.
>

  Thankyou Allen
          We are using 1 input file containing 1.5 million words and we are
creating image for each word in 1500 directories using  50 in level1 and
30 in level2 each directory having 1000 images in them
                        will there be any problem in  doing so ?




Mime
View raw message