hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alejandro Abdelnur <tuc...@gmail.com>
Subject Re: Hadoop 0.19, Cascading 1.0 and MultipleOutputs problem
Date Wed, 04 Feb 2009 07:07:57 GMT
Mikhail,

You are right, please open a Jira on this.

Alejandro


On Wed, Jan 28, 2009 at 9:23 PM, Mikhail Yakshin
<greycat.na.kor@gmail.com>wrote:

> Hi,
>
> We have a system based on Hadoop 0.18 / Cascading 0.8.1 and now I'm
> trying to port it to Hadoop 0.19 / Cascading 1.0. The first serious
> problem I've got into that we're extensively using MultipleOutputs in
> our jobs dealing with sequence files that store Cascading's Tuples.
>
> Since Cascading 0.9, Tuples stopped being WritableComparable and
> implemented generic Hadoop serialization interface and framework.
> However, in Hadoop 0.19, MultipleOutputs require use of older
> WritableComparable interface. Thus, trying to do something like:
>
> MultipleOutputs.addNamedOutput(conf, "output-name",
> MySpecialMultiSplitOutputFormat.class, Tuple.class, Tuple.class);
> mos = new MultipleOutputs(conf);
> ...
> mos.getCollector("output-name", reporter).collect(tuple1, tuple2);
>
> yields an error:
>
> java.lang.RuntimeException: java.lang.RuntimeException: class
> cascading.tuple.Tuple not org.apache.hadoop.io.WritableComparable
>        at
> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:752)
>        at
> org.apache.hadoop.mapred.lib.MultipleOutputs.getNamedOutputKeyClass(MultipleOutputs.java:252)
>        at
> org.apache.hadoop.mapred.lib.MultipleOutputs$InternalFileOutputFormat.getRecordWriter(MultipleOutputs.java:556)
>        at
> org.apache.hadoop.mapred.lib.MultipleOutputs.getRecordWriter(MultipleOutputs.java:425)
>        at
> org.apache.hadoop.mapred.lib.MultipleOutputs.getCollector(MultipleOutputs.java:511)
>        at
> org.apache.hadoop.mapred.lib.MultipleOutputs.getCollector(MultipleOutputs.java:476)
>        at my.namespace.MyReducer.reduce(MyReducer.java:xxx)
>
> Is there any known workaround for that? Any progress going on to make
> MultipleOutputs use generic Hadoop serialization?
>
> --
> WBR, Mikhail Yakshin
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message