hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From yunming zhang <zhangyunming1...@gmail.com>
Subject All datanodes are bad IOException when trying to implement multithreading serialization
Date Sun, 29 Sep 2013 20:52:57 GMT
Hi,

I was playing with Hadoop code trying to have a single Mapper support
reading a input split using multiple threads. I am getting All datanodes
are bad IOException, and I am not sure what is the issue.

The reason for this work is that I suspect my computation was slow because
it takes too long to create the Text() objects from inputsplit using a
single thread. I tried to modify the LineRecordReader (since I am mostly
using TextInputFormat) to provide additional methods to retrieve lines from
the input split  getCurrentKey2(), getCurrentValue2(), nextKeyValue2(). I
created a second FSDataInputStream, and second LineReader object for
getCurrentKey2(), getCurrentValue2() to read from. Essentially I am trying
to open the input split twice with different start points (one in the very
beginning, the other in the middle of the split) to read from input split
in parallel using two threads.

In the org.apache.hadoop.mapreduce.mapper.run() method, I modified it to
read simultaneously using getCurrentKey() and getCurrentKey2() using Thread
1 and Thread 2 (both threads running at the same time)


      Thread 1:
       while(context.nextKeyValue()){
                  map(context.getCurrentKey(), context.getCurrentValue(),
context);
        }

      Thread 2:
        while(context.nextKeyValue2()){
                map(context.getCurrentKey2(), context.getCurrentValue2(),
context);
                //System.out.println("two iter");
        }

However, this causes me to see the All Datanodes are bad exception. I think
I made sure that I closed the second file. I have attached a copy of my
LineRecordReader file to show what I changed trying to enable two
simultaneous read to the input split.

I have modified other files(org.apache.hadoop.mapreduce.RecordReader.java,
mapred.MapTask.java ....)  just to enable Mapper.run to call
LinRecordReader.getCurrentKey2() and other access methods for the second
file.


I would really appreciate it if anyone could give me a bit advice or just
point me to a direction as to where the problem might be,

Thanks

Yunming

Mime
View raw message