hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: MultipleInputs in 0.20
Date Sun, 09 May 2010 23:50:11 GMT
Please refer to MAPREDUCE-1743.

Other option is to duplicate MultipleInputs, DelegatingInputFormat classes
and slightly modify TaggedInputSplit (as I suggested earlier).
This way you use your own (functional) version :-)

On Sun, May 9, 2010 at 2:08 PM, Oded Rosen <oded@legolas-media.com> wrote:

> By what I've learned from different sites around the web (hadoop wiki,
> cloudera<
> http://www.cloudera.com/blog/2009/05/what%E2%80%99s-new-in-hadoop-core-020/
> >,
> mail archive, etc),
> the MultipleInputs class that was available in 0.18-0.19 versions of
> hadoop,
> was not moved to the 0.20 new API.
> (so does MultipleOutputs, but that's another story)
>
> I wanted to know if there is a way around this - to use two different paths
> with two different input format (sequence file, text file) as sources to
> the
> same job,
> with a special mapper for each input type - using hadoop 0.20 API. I think
> that writing a new job using 0.19 API only means more trouble later, when
> it's officially deprecated.
>
> I saw there is a jira <goog_292716485>
> (MAPREDUCE-1170)<https://issues.apache.org/jira/browse/MAPREDUCE-1170>open
> for this issue, with a patch marked as "Won't fix".
> If someone out there can help me with this, I will be most thankful.
>
> Cheers,
> --
> Oded
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message