hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kayla Jay <kaylai...@yahoo.com>
Subject RE: Map Intermediate key/value pairs written to file system
Date Fri, 18 Apr 2008 17:16:11 GMT

I don't know how to create unique individual file names for each mapper's key/value pairs.
 How do you create individual files per mappers key/value pairs so they don't overwrite one

I.e how do you create a new file each time and use that code for all the mappers and not have
each of the mappers trying to overwrite the other's output? If that makes any sense?

For example, at end of map(), i create sequence file for output of the key/value pair.  But,
if another mapper is running and it does the same thing, the file gets overwritten b/c this
other mapper is creating the exact file name and doing the exact same thing as the other mapper
is doing.

Devaraj Das <ddas@yahoo-inc.com> wrote: Will your requirement be addressed if, from
within the map method, you
create a sequence file using SequenceFile.createWriter api, write a
key/value using the writer's append(key,value)   API and then close the file
? You can do this for every key/value.
Pls have a look at createWriter APIs and the Writer class in

> -----Original Message-----
> From: Kayla Jay [mailto:kaylais30@yahoo.com] 
> Sent: Friday, April 18, 2008 6:12 PM
> To: core-user@hadoop.apache.org
> Subject: Map Intermediate key/value pairs written to file system
> Hi
> I have no reduces. I would like to directly write my map 
> results while they are produced after each map has completed 
> to disk.  I don't want to collect then write to output.
> If I wanted to directly write my map output 1-by-1 
> (intermediate key/value pairs) after each map completes into 
> individual files instead of collecting them until the end 
> then writing them in 1 swoop into the composite results file 
> (part-000X), is this possible and how do I do that?  
> Can I force a write within the map to write the map key/value 
> pairs as an individual file for each results set instead of 
> output.collect and having them all the key/value pairs 
> written to the output?
> I.e I would like the intermediate key/value pairs produced 
> from the maps to write to disk immediatly than having it to 
> collect in the end all of the key/value pairs and writing it 
> out.  I want individual files per key/value pair produced.
> Thanks.
> ---------------------------------
> Be a better friend, newshound, and know-it-all with Yahoo! 
> Mobile.  Try it now.

Be a better friend, newshound, and know-it-all with Yahoo! Mobile.  Try it now.
  • Unnamed multipart/alternative (inline, 8-Bit, 0 bytes)
View raw message