hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From anil gupta <anilgupt...@gmail.com>
Subject Problem in reading Map Output file via RecordReader<ImmutableBytesWritable, Put>
Date Wed, 30 Jan 2013 22:31:44 GMT
Hi All,

I am using HBase0.92.1. I am trying to break the HBase bulk loading into
multiple MR jobs since i want to populate more than one HBase table from a
single csv file. I have looked into MultiTableOutputFormat class but i
doesnt solve my purpose becasue it does not generates HFile.

I modified the bulk loader job of HBase and removed the reducer phase so
that i can generate  output of <ImmutableBytesWritable, Put> for multiple
tables in one MR job(phase 1).
Now, i ended up writing an input format that reads <ImmutableBytesWritable,
Put> to use it to read the output of mappers(phase 1) and generate the
HFiles for each table.

I implemented a RecordReader assuming that i can use the
readFields(DataInput) to read ImmutableBytesWritable and Put respectively.

As per my understanding, format of the input file(output files of mappers
of phase 1) is <deserialized ImmutableBytesWritable><deserialized Put>.
However when i am trying to read the file like that, the size of the
ImmutableBytesWritable is wrong and its throwing OOM due to that. Size of
ImmutableBytesWritable(rowkey) should not be greater than 32 bytes for my
use case but the as per the input it is 808460337 bytes. I am pretty sure
that either my understanding of input format is wrong or my implementation
of record reader is having some problem.

Can someone tell me the correct way of deserializing the output file of
mapper? or There is some problem with my code?
Here is the link to my initial stab at RecordReader:
Thanks & Regards,
Anil Gupta

View raw message