hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christoph Schmitz <Christoph.Schm...@1und1.de>
Subject AW: How to merge several SequenceFile into one?
Date Thu, 12 May 2011 13:44:57 GMT
Oops, sorry, I answered in the wrong thread. I intended to reply to the "How to create a SequenceFile
faster" issue.

Regards,
Christoph

-----Ursprüngliche Nachricht-----
Von: 丛林 [mailto:conglin02@gmail.com] 
Gesendet: Donnerstag, 12. Mai 2011 14:30
An: mapreduce-user@hadoop.apache.org
Betreff: Re: How to merge several SequenceFile into one?

Hi Christoph,

If there is no reducer, how can these sequence files be merged?

Thanks for you advice.

Best Wishes,

-Lin

在 2011年5月12日 下午7:44,Christoph Schmitz <Christoph.Schmitz@1und1.de> 写道:
> Hi Lin,
>
> you could run a map-only job, i.e. read your data and output it from the mapper without
any reducer at all (set mapred.reduce.tasks=0 or, equivalently, use job.setNumReduceTasks(0)).
>
> That way, you parallelize over your inputs through a number of mappers and do not have
any sort/shuffle/reduce overhead.
>
> Regards,
> Christoph
>
> -----Ursprüngliche Nachricht-----
> Von: 丛林 [mailto:conglin02@gmail.com]
> Gesendet: Donnerstag, 12. Mai 2011 13:16
> An: mapreduce-user@hadoop.apache.org
> Betreff: Re: How to merge several SequenceFile into one?
>
> Dear Jason,
>
> If the order of the keys in sequence file is not important to me, in
> other words, the sort process is not necessary, how can I stop the
> distributed sort to save the consumption of resource?
>
> Thanks for your suggestion.
>
> Best Wishes,
>
> -Lin
>
> 2011/5/12 jason <urgisb@gmail.com>:
>> M/R job with a single reducer would do the job. This way you can
>> utilize distributed sort and merge/combine/dedupe key/values as you
>> wish.
>>
>> On 5/11/11, 丛林 <conglin02@gmail.com> wrote:
>>> Hi all,
>>>
>>> There is lots of SequenceFile in HDFS, how can I merge them into one
>>> SequenceFile?
>>>
>>> Thanks for you suggestion.
>>>
>>> -Lin
>>>
>>
>
Mime
View raw message