hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alex Loddengaard" <alex...@google.com>
Subject Re: Using JavaSerialzation and SequenceFileInput
Date Wed, 17 Sep 2008 02:10:59 GMT
Unfortunately I don't know of a solution to your problem, but I've been
experiencing the exact same issues while trying to implement a Protocol
Buffer serialization.  Take a look:

<https://issues.apache.org/jira/browse/HADOOP-3788>

I hope this helps others to diagnose your problem.

Alex

On Wed, Sep 17, 2008 at 12:47 AM, Jason Grey <jason.grey.work@gmail.com>wrote:

> *HeadlineDocument *in the code below is equivalent to *MyObject* - I forgot
> to obfuscate that one... opps...
>
> On Tue, Sep 16, 2008 at 11:46 AM, Jason Grey <jason.grey.work@gmail.com
> >wrote:
>
> > I'm trying to use JavaSerialization for a series of MapReduce jobs, and
> > when it comes to reading a SequenceFile using SequenceFileInputFormat
> with
> > JavaSerialized objects, something breaks down.
> >
> > I've added "org.apache.hadoop.io.serializer.JavaSerialization" to the
> > io.serializations property in my config, and using native java types in
> my
> > mapper and reducer implementations, like so:
> >
> > MyMapper implements Mapper<String,MyObject,String,MyObject>
> > MyReducer implements Reducer<String,MyObject,String,MyObject>
> >
> > in my job configuration, i"m doing this:
> >
> > conf.setInputFormat(SequenceFileInputFormat.class);
> > FileInputFormat.setInputPaths(conf, path1, path2);
> > conf.setOutputFormat(SequenceFileOutputFormat.class);
> > FileOutputFormat.setOutputPath(conf, path3);
> > conf.setOutputKeyClass(String.class);
> > conf.setOutputKeyComparatorClass(JavaSerializationComparator.class);
> > conf.setOutputValueClass(MyObject.class);
> > conf.setMapperClass(MyMapper.class);
> > conf.setReducerClass(MyReducer.class);
> >
> > When I run the job, and output the keys & values from the mapper to
> > System.out, it doesn't seem like the key & value are getting populated
> > correctly - the key is NULL, and the value is a new, empty instance of
> > MyObject.
> >
> > The files this job is reading were output by another job that used a
> custom
> > InputFormat, and so it didn't have the same problem, and I have validated
> > using a SequenceFile.Reader that the data is actually there, and
> non-null.
> > One strange thing i had to do to get the reader to work is this (see
> *BOLD
> > * text - I had to add that in order for the values to show up - I think
> > this may have something to do with why SequenceFileInputFormat is having
> > trouble as well...)
> >
> > String key = new String();
> > while (*(key = (String) *r.next(key)) != null) {
> >      HeadlineDocument value = new HeadlineDocument();
> >      *value = (HeadlineDocument) *r.getCurrentValue(value);
> >      System.out.println("Key: " + key.toString());
> >      System.out.println("Value: " + value.toString());
> > }
> >
> > Anyone got any hints as to how one uses JavaSerialization properly in the
> > INPUT phase of a MapReduce job?
> >
> > Thanks for any help
> >
> > -jg-
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message