hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From yunming zhang <zhangyunming1...@gmail.com>
Subject Re: All datanodes are bad IOException when trying to implement multithreading serialization
Date Mon, 30 Sep 2013 01:45:56 GMT
I am actually trying to reduce the number of mappers because my application
takes up a lot of memory (in the order of 1-2 GB ram per mapper).  I want
to be able to use a few mappers but still maintain good CPU utilization
through multithreading within a single mapper. Multithreaded Mapper does't
work because it duplicates in memory data structures.

Thanks

Yunming


On Sun, Sep 29, 2013 at 6:59 PM, Sonal Goyal <sonalgoyal4@gmail.com> wrote:

> Wouldn't you rather just change your split size so that you can have more
> mappers work on your input? What else are you doing in the mappers?
> Sent from my iPad
>
> On Sep 30, 2013, at 2:22 AM, yunming zhang <zhangyunming1990@gmail.com>
> wrote:
>
> Hi,
>
> I was playing with Hadoop code trying to have a single Mapper support
> reading a input split using multiple threads. I am getting All datanodes
> are bad IOException, and I am not sure what is the issue.
>
> The reason for this work is that I suspect my computation was slow because
> it takes too long to create the Text() objects from inputsplit using a
> single thread. I tried to modify the LineRecordReader (since I am mostly
> using TextInputFormat) to provide additional methods to retrieve lines from
> the input split  getCurrentKey2(), getCurrentValue2(), nextKeyValue2(). I
> created a second FSDataInputStream, and second LineReader object for
> getCurrentKey2(), getCurrentValue2() to read from. Essentially I am trying
> to open the input split twice with different start points (one in the very
> beginning, the other in the middle of the split) to read from input split
> in parallel using two threads.
>
> In the org.apache.hadoop.mapreduce.mapper.run() method, I modified it to
> read simultaneously using getCurrentKey() and getCurrentKey2() using Thread
> 1 and Thread 2 (both threads running at the same tim
>       Thread 1:
>        while(context.nextKeyValue()){
>                   map(context.getCurrentKey(), context.getCurrentValue(),
> context);
>         }
>
>       Thread 2:
>         while(context.nextKeyValue2()){
>                 map(context.getCurrentKey2(), context.getCurrentValue2(),
> context);
>                 //System.out.println("two iter");
>         }
>
> However, this causes me to see the All Datanodes are bad exception. I
> think I made sure that I closed the second file. I have attached a copy of
> my LineRecordReader file to show what I changed trying to enable two
> simultaneous read to the input split.
>
> I have modified other files(org.apache.hadoop.mapreduce.RecordReader.java,
> mapred.MapTask.java ....)  just to enable Mapper.run to call
> LinRecordReader.getCurrentKey2() and other access methods for the second
> file.
>
>
> I would really appreciate it if anyone could give me a bit advice or just
> point me to a direction as to where the problem might be,
>
> Thanks
>
> Yunming
>
> <LineRecordReader.java>
>
>

Mime
View raw message