hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sonal Goyal <sonalgoy...@gmail.com>
Subject Re: All datanodes are bad IOException when trying to implement multithreading serialization
Date Sun, 29 Sep 2013 23:59:37 GMT
Wouldn't you rather just change your split size so that you can have more mappers work on your
input? What else are you doing in the mappers?
Sent from my iPad

On Sep 30, 2013, at 2:22 AM, yunming zhang <zhangyunming1990@gmail.com> wrote:

> Hi, 
> 
> I was playing with Hadoop code trying to have a single Mapper support reading a input
split using multiple threads. I am getting All datanodes are bad IOException, and I am not
sure what is the issue. 
> 
> The reason for this work is that I suspect my computation was slow because it takes too
long to create the Text() objects from inputsplit using a single thread. I tried to modify
the LineRecordReader (since I am mostly using TextInputFormat) to provide additional methods
to retrieve lines from the input split  getCurrentKey2(), getCurrentValue2(), nextKeyValue2().
I created a second FSDataInputStream, and second LineReader object for getCurrentKey2(), getCurrentValue2()
to read from. Essentially I am trying to open the input split twice with different start points
(one in the very beginning, the other in the middle of the split) to read from input split
in parallel using two threads.  
> 
> In the org.apache.hadoop.mapreduce.mapper.run() method, I modified it to read simultaneously
using getCurrentKey() and getCurrentKey2() using Thread 1 and Thread 2 (both threads running
at the same tim
>       Thread 1:
>        while(context.nextKeyValue()){
>                   map(context.getCurrentKey(), context.getCurrentValue(), context);
>         }
> 
>       Thread 2:
>         while(context.nextKeyValue2()){
>                 map(context.getCurrentKey2(), context.getCurrentValue2(), context);
>                 //System.out.println("two iter");
>         }
> 
> However, this causes me to see the All Datanodes are bad exception. I think I made sure
that I closed the second file. I have attached a copy of my LineRecordReader file to show
what I changed trying to enable two simultaneous read to the input split. 
> 
> I have modified other files(org.apache.hadoop.mapreduce.RecordReader.java, mapred.MapTask.java
....)  just to enable Mapper.run to call LinRecordReader.getCurrentKey2() and other access
methods for the second file. 
> 
> 
> I would really appreciate it if anyone could give me a bit advice or just point me to
a direction as to where the problem might be, 
> 
> Thanks
> 
> Yunming 
> 
> <LineRecordReader.java>

Mime
View raw message