hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Panayotis Antonopoulos <antonopoulos...@hotmail.com>
Subject RE: AW: How to merge several SequenceFile into one?
Date Wed, 25 May 2011 01:31:29 GMT

I would like to merge some SequenceFiles as well, so any help would be great!

Although the solution with the single reducer works great, the files are small so I don't
need distribution.
I think I will create a simple java program that will read these files and merge them.

> From: Christoph.Schmitz@1und1.de
> To: mapreduce-user@hadoop.apache.org
> Date: Thu, 12 May 2011 15:44:57 +0200
> Subject: AW: How to merge several SequenceFile into one?
> 
> Oops, sorry, I answered in the wrong thread. I intended to reply to the "How to create
a SequenceFile faster" issue.
> 
> Regards,
> Christoph
> 
> -----Ursprüngliche Nachricht-----
> Von: 丛林 [mailto:conglin02@gmail.com] 
> Gesendet: Donnerstag, 12. Mai 2011 14:30
> An: mapreduce-user@hadoop.apache.org
> Betreff: Re: How to merge several SequenceFile into one?
> 
> Hi Christoph,
> 
> If there is no reducer, how can these sequence files be merged?
> 
> Thanks for you advice.
> 
> Best Wishes,
> 
> -Lin
> 
> 在 2011年5月12日 下午7:44,Christoph Schmitz <Christoph.Schmitz@1und1.de>
写道:
> > Hi Lin,
> >
> > you could run a map-only job, i.e. read your data and output it from the mapper
without any reducer at all (set mapred.reduce.tasks=0 or, equivalently, use job.setNumReduceTasks(0)).
> >
> > That way, you parallelize over your inputs through a number of mappers and do not
have any sort/shuffle/reduce overhead.
> >
> > Regards,
> > Christoph
> >
> > -----Ursprüngliche Nachricht-----
> > Von: 丛林 [mailto:conglin02@gmail.com]
> > Gesendet: Donnerstag, 12. Mai 2011 13:16
> > An: mapreduce-user@hadoop.apache.org
> > Betreff: Re: How to merge several SequenceFile into one?
> >
> > Dear Jason,
> >
> > If the order of the keys in sequence file is not important to me, in
> > other words, the sort process is not necessary, how can I stop the
> > distributed sort to save the consumption of resource?
> >
> > Thanks for your suggestion.
> >
> > Best Wishes,
> >
> > -Lin
> >
> > 2011/5/12 jason <urgisb@gmail.com>:
> >> M/R job with a single reducer would do the job. This way you can
> >> utilize distributed sort and merge/combine/dedupe key/values as you
> >> wish.
> >>
> >> On 5/11/11, 丛林 <conglin02@gmail.com> wrote:
> >>> Hi all,
> >>>
> >>> There is lots of SequenceFile in HDFS, how can I merge them into one
> >>> SequenceFile?
> >>>
> >>> Thanks for you suggestion.
> >>>
> >>> -Lin
> >>>
> >>
> >
 		 	   		  
Mime
View raw message