hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Geoffry Roberts <geoffry.robe...@gmail.com>
Subject Re: Does Using MultipleTextOutputFormat Require the Deprecated API?
Date Tue, 15 Dec 2009 16:04:51 GMT
Amogh,

Thanks for the attachment.  I'll hold on to it.

If I may press you a bit further, I noticed that the directory tree is
different in the distribution I downloaded than the various paths I see in
the the patch.  It is different still in the svn trunk.

What I want is to apply the patch to my hadoop 0.20.1 distribution.  It
doesn't just work because of this directory vs path business.   I suppose I
could hack on the patch but it seems I shouldn't have to.

Why these three differences? release, trunk, patch?  Am I using the wrong
code base?

On Mon, Dec 14, 2009 at 9:30 PM, Amogh Vasekar <amogh@yahoo-inc.com> wrote:

>  Yes. Also attached is an old thread I have kept handy with me. Hope this
> helps you.
>
>
> Thanks,
> Amogh
>
>
> On 12/11/09 10:07 PM, "Geoffry Roberts" <geoffry.roberts@gmail.com> wrote:
>
> Amogh,
>
> I don't have experience with patches for hadoop.
>
> I take it that I apply this patch using the linux patch utility.
>
> I further assume, I need only apply the latest patch, which is 5.
>
> Am I correct.
>
> On Wed, Dec 9, 2009 at 7:30 AM, Amogh Vasekar <amogh@yahoo-inc.com> wrote:
>
> http://issues.apache.org/jira/browse/MAPREDUCE-370
>
> You’ll  have to work around for now / try to apply patch.
>
> Amogh
>
>
>
> On 12/9/09 8:54 PM, "Geoffry Roberts" <geoffry.roberts@gmail.com <
> http://geoffry.roberts@gmail.com> > wrote:
>
> Aaron,
>
> I am using 0.20.1 and I'm not finding org.apache.hadoop.mapreduce.
>
> lib.output.MultipleOutputs.  I'm using the download page where the tar ball
> is dated from Sep.09.
>
>
>
> Sounds like I need to look at the code repository.
>
>
>
> On Tue, Dec 8, 2009 at 1:39 PM, Aaron Kimball <aaron@cloudera.com <
> http://aaron@cloudera.com> > wrote:
>
> Geoffry,
>
> There are two MultipleOutputs implementations; one for the new API, one for
> the old one.
>
> The new API (org.apache.hadoop.mapreduce.lib.output.MultipleOutputs) does
> not have a getCollector() method. This is intended to work with
> org.apache.hadoop.mapreduce.Mapper and its associated Context object.
>
> The old API implementation of MO
> (org.apache.hadoop.mapred.lib.MultipleOutputs) is intended to work with
> org.apache.hadoop.mapred.Mapper, Reporter, and friends.
>
> If you're going to use the new org.apache.hadoop.mapreduce-based code, you
> should not need to import anything in the mapred package. That having been
> said -- I just realized that the new-API-compatible MultipleOutputs
> implementation is not in Hadoop 0.20. It's only in the unreleased 0.21. If
> you're using 0.20, you should probably stick with the old API for your
> process.
>
> Cheers,
> - Aaron
>
>
> On Tue, Dec 8, 2009 at 12:40 PM, Geoffry Roberts <
> geoffry.roberts@gmail.com <http://geoffry.roberts@gmail.com> > wrote:
>
> All,
>
> This one has me stumped.
>
> What I want to do is output from my reducer multiple files, one for each
> key value. I also want to avoid any deprecated parts of the API.
>
> As suggested, I switched from using MultipleTextOutputFormat to
> MultipleOutputs but have run into an impasse.  MultipleOutputs' getCollector
> method requires a Reporter as a parameter, but as far as I can tell, the API
> doesn't support this.  The only reporter I can find is in the context
> object, but is declared protected.
>
> Am I stuck? or just missing something?
>
> My code:
>
> @Override
> public void reduce(Text key, Iterable<Text> values, Context context)
>                 throws IOException {
> String fileName = key.toString();
>              MultipleOutputs.addNamedOutput((JobConf)
> context.getConfiguration(), fileName, OutputFormat.class, Text.class,
> Text.class);
>             mos = new MultipleOutputs((JobConf)
> context.getConfiguration());
>             for (Text line : values) {
>
> // This is the problem line:
>                 mos.getCollector(fileName, <reporter goes here>).collect(
>                         key, line);
>             }
>
>             mos.close();
>
>          }
>
> On Mon, Oct 5, 2009 at 11:17 AM, Aaron Kimball <aaron@cloudera.com <
> http://aaron@cloudera.com> > wrote:
>
> Geoffry,
>
> The new API comes with a related OF, called MultipleOutputs
> (o.a.h.mapreduce.lib.output.MultipleOutputs). You may want to look into
> using this instead.
>
> - Aaron
>
>
> On Tue, Sep 29, 2009 at 4:44 PM, Geoffry Roberts <
> geoffry.roberts@gmail.com <http://geoffry.roberts@gmail.com> > wrote:
>
> All,
>
> What I want to do is output from my reducer multiple files one for each key
> value.
>
> Can this still be done in the current API?
>
> It seems that using MultipleTextOutputFormat requires one to use deprecated
> parts of API.
>
> It this correct?
>
> I would like to use the class or its equivalent and stay off anything
> deprecated.
>
> Is there a work around?
>
> In the current API one uses Job and a class derived from the classorg.apache.hadoop.mapreduce.OutputFormat.
> MultipleTextOutputFormat does not derive from this class.
>
> Job.setOutputFormatClass(Class<? extends org.apache.hadoop.mapreduce.
> OutputFormat>);
>
>
> In the Old, deprecated API, one uses JobConf and an implementation of the
> interface org.apache.hadoop.mapred.OutputFormat.  MultipleTextOutputFormat
> is just such an implementation.
>
> JobConf.setOutputFormat(Class<? extends org.apache.hadoop.mapred .
> OutputFormat);
>
>
>
>
>
>
>
>
>

Mime
View raw message