hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kayla Jay <kaylai...@yahoo.com>
Subject RE: Map Intermediate key/value pairs written to file system
Date Fri, 18 Apr 2008 18:32:16 GMT
Hi.
 
 Thanks for all of your responses -- it has been very helpful indeed.  I am working on the
filename as suggested and will ask more questions if I get stuck.
 
 I do need one file per key/value.  I don't want to sit and collect then create the output
file.

Is this something bad based on this misguidance?

I can explain what I am doing if you think I am way off in what I am trying to achieve.

Please advise.

Thanks.

Devaraj Das <ddas@yahoo-inc.com> wrote: Well.. Kayla specifically mentioned that he
wants one file per key/value..
Kayla should clarify this.. 

> -----Original Message-----
> From: Ted Dunning [mailto:tdunning@veoh.com] 
> Sent: Friday, April 18, 2008 11:54 PM
> To: core-user@hadoop.apache.org
> Subject: Re: Map Intermediate key/value pairs written to file system
> 
> 
> Yes, but Kayla is likely misguided in this respect.
> 
> (my apologies for sounding doctrinaire)
> 
> 
> On 4/18/08 11:08 AM, "Devaraj Das" 
    wrote:
> 
> > Ted, note that Kayla wants one file per output key/value.
> > 
> >> -----Original Message-----
> >> From: Ted Dunning [mailto:tdunning@veoh.com]
> >> Sent: Friday, April 18, 2008 11:20 PM
> >> To: core-user@hadoop.apache.org
> >> Subject: Re: Map Intermediate key/value pairs written to 
> file system
> >> 
> >> 
> >> Isn't this just what Hadoop does when you set numReduces = 0?
> >> 
> >> 
> >> On 4/18/08 10:45 AM, "Devaraj Das" 
    wrote:
> >> 
> >>> Within a task you can get the taskId (which are unique). Define 
> >>> "public void configure(JobConf job)" and in that get the 
> taskId by 
> >>> doing
> >>> job.get("mapred.task.id") ).
> >>> 
> >>> Now create filenames starting with that as the prefix and maybe a 
> >>> monotonically increasing integer as the suffix (defined 
> as a static 
> >>> field in the task)..
> >>> 
> >>>> -----Original Message-----
> >>>> From: Kayla Jay [mailto:kaylais30@yahoo.com]
> >>>> Sent: Friday, April 18, 2008 10:46 PM
> >>>> To: core-user@hadoop.apache.org
> >>>> Subject: RE: Map Intermediate key/value pairs written to
> >> file system
> >>>> 
> >>>> Hi.
> >>>> 
> >>>> I don't know how to create unique individual file names for each 
> >>>> mapper's key/value pairs.  How do you create individual 
> files per 
> >>>> mappers key/value pairs so they don't overwrite one another?
> >>>> 
> >>>> I.e how do you create a new file each time and use that
> >> code for all
> >>>> the mappers and not have each of the mappers trying to
> >> overwrite the
> >>>> other's output? If that makes any sense?
> >>>> 
> >>>> For example, at end of map(), i create sequence file for 
> output of 
> >>>> the key/value pair.  But, if another mapper is running 
> and it does 
> >>>> the same thing, the file gets overwritten b/c this other 
> mapper is 
> >>>> creating the exact file name and doing the exact same 
> thing as the 
> >>>> other mapper is doing.
> >>>> 
> >>>> Devaraj Das 
    wrote: Will your requirement be 
> >>>> addressed if, from within the map method, you create a
> >> sequence file
> >>>> using SequenceFile.createWriter api, write a
> >>>> key/value using the writer's append(key,value)   API and then
> >>>> close the file
> >>>> ? You can do this for every key/value.
> >>>> Pls have a look at createWriter APIs and the Writer class in 
> >>>> o.a.h.i.SequenceFile..
> >>>> 
> >>>>> -----Original Message-----
> >>>>> From: Kayla Jay [mailto:kaylais30@yahoo.com]
> >>>>> Sent: Friday, April 18, 2008 6:12 PM
> >>>>> To: core-user@hadoop.apache.org
> >>>>> Subject: Map Intermediate key/value pairs written to file system
> >>>>> 
> >>>>> Hi
> >>>>> 
> >>>>> I have no reduces. I would like to directly write my map
> >>>> results while
> >>>>> they are produced after each map has completed to disk.  I
> >>>> don't want
> >>>>> to collect then write to output.
> >>>>> 
> >>>>> If I wanted to directly write my map output 1-by-1 
> (intermediate 
> >>>>> key/value pairs) after each map completes into individual files

> >>>>> instead of collecting them until the end then writing them
> >>>> in 1 swoop
> >>>>> into the composite results file (part-000X), is this
> >>>> possible and how
> >>>>> do I do that?
> >>>>> 
> >>>>> Can I force a write within the map to write the map
> >>>> key/value pairs as
> >>>>> an individual file for each results set instead of
> >>>> output.collect and
> >>>>> having them all the key/value pairs written to the output?
> >>>>> 
> >>>>> I.e I would like the intermediate key/value pairs
> >> produced from the
> >>>>> maps to write to disk immediatly than having it to collect
> >>>> in the end
> >>>>> all of the key/value pairs and writing it out.  I want 
> individual 
> >>>>> files per key/value pair produced.
> >>>>> 
> >>>>> Thanks.
> >>>>> 
> >>>>>        
> >>>>> ---------------------------------
> >>>>> Be a better friend, newshound, and know-it-all with Yahoo!
> >>>>> Mobile.  Try it now.
> >>>>> 
> >>>> 
> >>>> 
> >>>> 
> >>>>        
> >>>> ---------------------------------
> >>>> Be a better friend, newshound, and know-it-all with Yahoo!
> >>>> Mobile.  Try it now.
> >>>> 
> >>> 
> >> 
> >> 
> > 
> 
> 



       
---------------------------------
Be a better friend, newshound, and know-it-all with Yahoo! Mobile.  Try it now.
Mime
  • Unnamed multipart/alternative (inline, 8-Bit, 0 bytes)
View raw message