hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Philip Zeyliger <phi...@cloudera.com>
Subject Re: Custom InputFormat, problem with constructors
Date Fri, 11 Dec 2009 19:20:15 GMT
Hi Antonio,

Check out MapTask.java.  When your job gets instantiated on the cluster, an
InputSplit object is created for the task, using reflection.  An InputSplit
is a Writable, and, like all writables, it gets created with an empty
constructor and initialized with readFields().

If you implement write() and readFields() correctly (think of these as
serialization and de-serialization functions), it should all work.  See
FileSplit for an example of how FileInputFormat does it.

Cheers,

-- Philip

Here's a code excerpt from MapTask.java, that's relevant:


  void runOldMapper(final JobConf job,
>                     final BytesWritable rawSplit,
>                     final TaskUmbilicalProtocol umbilical,
>                     TaskReporter reporter
>                     ) throws IOException, InterruptedException,
>                              ClassNotFoundException {
>     InputSplit inputSplit = null;
>     // reinstantiate the split
>     try {
>       inputSplit = (InputSplit)
>         ReflectionUtils.newInstance(job.getClassByName(splitClass), job);
>     } catch (ClassNotFoundException exp) {
>       IOException wrap = new IOException("Split class " + splitClass +
>                                          " not found");
>       wrap.initCause(exp);
>       throw wrap;
>     }
>     DataInputBuffer splitBuffer = new DataInputBuffer();
>     splitBuffer.reset(split.getBytes(), 0, split.getLength());
>     inputSplit.readFields(splitBuffer);
>



On Fri, Dec 11, 2009 at 11:03 AM, Antonio D'Ettole <codazzo@gmail.com>wrote:

> Hi,
>
> I've been trying to code a pretty simple InputFormat. The idea is this: I
> have an array of numbers (say, the range [0-5000]) and I want each mapper
> to
> receive a split of size 500 i.e. 500 LongWritable's.
>
> this is an excerpt from the class extending InputSplit:
>
> public class myInputSplit extends InputSplit implements Writable {
>
> long[] rows;
>        myInputSplit(){ }
>
> public myInputSplit(long[] rows) {
> this.rows=rows;
> }
>
>    .....
>
> }
>
> I also wrote the classes myInputFormat and myRecordReader (omitted).
>
> Now, the default constructor in the class above doesn't do much but I had
> to
> put it there anyway because hadoop was throwing an exception at runtime
> because it couldn't find said constructor. Obviously myInputFormat uses the
> right constructor with the long[] argument, but hadoop sems somehow to give
> the mapper input splits which have been built using the default
> constructor,
> which is used nowhere in my code. I can tell because i put a breakpoint in
> the default constructor and yes, it is being called. As a result all the
> input splits that are processed by the mappers are "broken" as the "rows"
> variable was never set.
> Interestingly, I also put a breakpoint in the _right_ constructor and it is
> also being called, by the getSplits() method in myInputFormat (which is
> what
> one would expect)
>
> Does anybody have an idea why the default constructor is being called?
>
> I hope I was clear enough, thanks for your time.
> Antonio
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message