hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alejandro Abdelnur <t...@cloudera.com>
Subject Re: Re: Help!!The problem about Hadoop
Date Tue, 05 Oct 2010 10:07:48 GMT
Or you could try using MultiFileInputFormat for your MR job.

http://hadoop.apache.org/mapreduce/docs/current/api/org/apache/hadoop/mapred/MultiFileInputFormat.html

Alejandro

On Tue, Oct 5, 2010 at 4:55 PM, Harsh J <qwertymaniac@gmail.com> wrote:
> 500 small files comprising one gigabyte? Perhaps you should try
> concatenating them all into one big file and try; as a mapper is
> supposed to run at least for a minute optimally. And small files don't
> make good use of the HDFS block feature.
>
> Have a read: http://www.cloudera.com/blog/2009/02/the-small-files-problem/
>
> 2010/10/5 Jander <442950758@163.com>:
>> Hi Jeff,
>>
>> Thank you very much for your reply sincerely.
>>
>> I exactly know hadoop has overhead, but is it too large in my problem?
>>
>> The 1GB text input has about 500 map tasks because the input is composed of little
text file. And the time each map taken is from 8 seconds to 20 seconds. I use compression
like conf.setCompressMapOutput(true).
>>
>> Thanks,
>> Jander
>>
>>
>>
>>
>> At 2010-10-05 16:28:55,"Jeff Zhang" <zjffdu@gmail.com> wrote:
>>
>>>Hi Jander,
>>>
>>>Hadoop has overhead compared to single-machine solution. How many task
>>>have you get when you run your hadoop job ? And what is time consuming
>>>for each map and reduce task ?
>>>
>>>There's lots of tips for performance tuning of hadoop. Such as
>>>compression and jvm reuse.
>>>
>>>
>>>2010/10/5 Jander <442950758@163.com>:
>>>> Hi, all
>>>> I do an application using hadoop.
>>>> I take 1GB text data as input the result as follows:
>>>>    (1) the cluster of 3 PCs: the time consumed is 1020 seconds.
>>>>    (2) the cluster of 4 PCs: the time is about 680 seconds.
>>>> But the application before I use Hadoop takes about 280 seconds, so as the
speed above, I must use 8 PCs in order to have the same speed as before. Now the problem:
whether it is correct?
>>>>
>>>> Jander,
>>>> Thanks.
>>>>
>>>>
>>>>
>>>
>>>
>>>
>>>--
>>>Best Regards
>>>
>>>Jeff Zhang
>>
>
>
>
> --
> Harsh J
> www.harshj.com
>

Mime
View raw message