hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christoph Schmitz <Christoph.Schm...@1und1.de>
Subject AW: Out-of-band writing from mapper
Date Wed, 20 Apr 2011 12:51:46 GMT

you are probably using the wrong MultipleOutputs from the org.apache.hadoop.mapred.lib package.
There is another one in org.apache.hadoop.mapreduce.lib.output, which fits into the new 0.20


-----Urspr√ľngliche Nachricht-----
Von: Panayotis Antonopoulos [mailto:antonopoulospan@hotmail.com] 
Gesendet: Mittwoch, 20. April 2011 14:27
An: mapreduce-user@hadoop.apache.org
Betreff: RE: Out-of-band writing from mapper

I am trying to use the MultipleOutputs class using hadoop 0.20.2 and I have the following
1) The JobConf and org.apache.hadoop.mapred.TextOutputFormat classes which are needed to call
"MultipleOutputs.addNamedOutput( (JobConf) configuration, "text", TextOutputFormat.class,
LongWritable.class, Text.class)";"  are deprecated

2)I cannot get a reporter from the mapper so that I can call "mos.getCollector("text",context
).collect(emmitedKey, new Text("Hello"));". I only have the context.

Can you please help me with that?
Is there any other way to have multiple outputs from the mapper?
Thank you in advance,
Panagiotis A

> From: harsh@cloudera.com
> Date: Wed, 20 Apr 2011 14:48:10 +0530
> Subject: Re: Out-of-band writing from mapper
> To: mapreduce-user@hadoop.apache.org
> Hello Christoph,
> On Wed, Apr 20, 2011 at 2:12 PM, Christoph Schmitz
> <Christoph.Schmitz@1und1.de> wrote:
> > My question is: is there any mechanism to assist me in writing to some designated
place in the HDFS from the mapper, in a way that is recognized by the framework (i.e. dealing
with aborted tasks, speculative execution etc.)?
> >
> > I was thinking along the lines of what is described in the FAQ here:
> >
> > http://wiki.apache.org/hadoop/FAQ#Can_I_write_create.2BAC8-write-to_hdfs_files_directly_from_map.2BAC8-reduce_tasks.3F
> >
> > The FAQ explains that for reducers, there is support for special per-task output
directories that are recognized by the framework, but it seems (I tried it out) that this
is not supported for mappers.
> [Perhaps you can consider using the MultipleOutputs class to write
> output files from your job, instead of writing your own FS handling
> code.]
> The attempt directories are created for both Map and Reduce tasks. If
> the FAQ makes this ambiguous, it ought to be fixed :)
> -- 
> Harsh J

View raw message