hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Subir S <subir.sasiku...@gmail.com>
Subject Re: Handling bad records
Date Tue, 28 Feb 2012 08:49:51 GMT
Can multiple output be used with Hadoop Streaming?

On Tue, Feb 28, 2012 at 2:07 PM, madhu phatak <phatak.dev@gmail.com> wrote:

> Hi Mohit ,
>  A and B refers to two different output files (multipart name). The file
> names will be seq-A* and seq-B*.  Its similar to "r" in part-r-00000
>
> On Tue, Feb 28, 2012 at 11:37 AM, Mohit Anchlia <mohitanchlia@gmail.com
> >wrote:
>
> > Thanks that's helpful. In that example what is "A" and "B" referring to?
> Is
> > that the output file name?
> >
> > mos.getCollector("seq", "A", reporter).collect(key, new Text("Bye"));
> > mos.getCollector("seq", "B", reporter).collect(key, new Text("Chau"));
> >
> >
> > On Mon, Feb 27, 2012 at 9:53 PM, Harsh J <harsh@cloudera.com> wrote:
> >
> > > Mohit,
> > >
> > > Use the MultipleOutputs API:
> > >
> > >
> >
> http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/lib/MultipleOutputs.html
> > > to have a named output of bad records. There is an example of use
> > > detailed on the link.
> > >
> > > On Tue, Feb 28, 2012 at 3:48 AM, Mohit Anchlia <mohitanchlia@gmail.com
> >
> > > wrote:
> > > > What's the best way to write records to a different file? I am doing
> > xml
> > > > processing and during processing I might come accross invalid xml
> > format.
> > > > Current I have it under try catch block and writing to log4j. But I
> > think
> > > > it would be better to just write it to an output file that just
> > contains
> > > > errors.
> > >
> > >
> > >
> > > --
> > > Harsh J
> > >
> >
>
>
>
> --
> Join me at http://hadoopworkshop.eventbrite.com/
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message