Return-Path: X-Original-To: apmail-hadoop-common-user-archive@www.apache.org Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id F195FE8F3 for ; Wed, 30 Jan 2013 22:32:35 +0000 (UTC) Received: (qmail 75216 invoked by uid 500); 30 Jan 2013 22:32:31 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 75074 invoked by uid 500); 30 Jan 2013 22:32:30 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 75066 invoked by uid 99); 30 Jan 2013 22:32:30 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 30 Jan 2013 22:32:30 +0000 X-ASF-Spam-Status: No, hits=1.7 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of anilgupta84@gmail.com designates 74.125.83.50 as permitted sender) Received: from [74.125.83.50] (HELO mail-ee0-f50.google.com) (74.125.83.50) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 30 Jan 2013 22:32:26 +0000 Received: by mail-ee0-f50.google.com with SMTP id e51so1165123eek.9 for ; Wed, 30 Jan 2013 14:32:04 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=x-received:mime-version:from:date:message-id:subject:to :content-type; bh=iRixlU9lDvfPD2W/cVL5lg9qOjNhr1wttE/+5Pv//94=; b=bExrA4/TXV+xlM6BRjAPrc2FPb5yGEM6WsHmbQNqjMI5WLOz5jjmzLUc8cA+5oVJQ1 9ng8r6Z4hGBgC+n0OIyI/k8AsMRK/teTXhV3FhQCoGLc2Z8g8nv42Z66Dg9w1e6LCZMX oJeumV8sar2ZgFbov6RQEolYKwZvq+cMx4n/nHdRg9Bp3wgCCOn63i7l2DIHOYpHP+5v k5suaS0RuAkDxb8YwkAG1vVlYxongZckBjiecMGpQaSzGobdhrCX0DGjAJ9UjnSRgbfN l8TXBYTgMtp9QtOuJJlWps3Ux7QpeoQVvIZGijhjJsIMEhhdnnO6EFtXKTfIaDMoFmLE 6z5g== X-Received: by 10.14.219.5 with SMTP id l5mr19955153eep.7.1359585124826; Wed, 30 Jan 2013 14:32:04 -0800 (PST) MIME-Version: 1.0 Received: by 10.223.144.137 with HTTP; Wed, 30 Jan 2013 14:31:44 -0800 (PST) From: anil gupta Date: Wed, 30 Jan 2013 14:31:44 -0800 Message-ID: Subject: Problem in reading Map Output file via RecordReader To: user@hbase.apache.org, "common-user@hadoop.apache.org" Content-Type: multipart/alternative; boundary=047d7b62214c683dc204d48914bb X-Virus-Checked: Checked by ClamAV on apache.org --047d7b62214c683dc204d48914bb Content-Type: text/plain; charset=ISO-8859-1 Hi All, I am using HBase0.92.1. I am trying to break the HBase bulk loading into multiple MR jobs since i want to populate more than one HBase table from a single csv file. I have looked into MultiTableOutputFormat class but i doesnt solve my purpose becasue it does not generates HFile. I modified the bulk loader job of HBase and removed the reducer phase so that i can generate output of for multiple tables in one MR job(phase 1). Now, i ended up writing an input format that reads to use it to read the output of mappers(phase 1) and generate the HFiles for each table. I implemented a RecordReader assuming that i can use the readFields(DataInput) to read ImmutableBytesWritable and Put respectively. As per my understanding, format of the input file(output files of mappers of phase 1) is . However when i am trying to read the file like that, the size of the ImmutableBytesWritable is wrong and its throwing OOM due to that. Size of ImmutableBytesWritable(rowkey) should not be greater than 32 bytes for my use case but the as per the input it is 808460337 bytes. I am pretty sure that either my understanding of input format is wrong or my implementation of record reader is having some problem. Can someone tell me the correct way of deserializing the output file of mapper? or There is some problem with my code? Here is the link to my initial stab at RecordReader: https://dl.dropbox.com/u/64149128/ImmutableBytesWritable_Put_RecordReader.java -- Thanks & Regards, Anil Gupta --047d7b62214c683dc204d48914bb Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Hi All,

I am using HBase0.92.1. I am trying to break the HBase bulk = loading into multiple MR jobs since i want to populate more than one HBase = table from a single csv file. I have looked into MultiTableOutputFormat cla= ss but i doesnt solve my purpose becasue it does not generates HFile.

I modified the bulk loader job of HBase and removed the reducer phase s= o that i can generate=A0 output of <ImmutableBytesWritable, Put> for = multiple tables in one MR job(phase 1).
Now, i ended up writing an inpu= t format that reads <ImmutableBytesWritable, Put> to use it to read t= he output of mappers(phase 1) and generate the HFiles for each table.

I implemented a RecordReader assuming that i can use the readFields(Dat= aInput) to read ImmutableBytesWritable and Put respectively.

As per = my understanding, format of the input file(output files of mappers of phase= 1) is <deserialized ImmutableBytesWritable><deserialized Put>.= However when i am trying to read the file like that, the size of the Immut= ableBytesWritable is wrong and its throwing OOM due to that. Size of Immuta= bleBytesWritable(rowkey) should not be greater than 32 bytes for my use cas= e but the as per the input it is 808460337 bytes. I am pretty sure that eit= her my understanding of input format is wrong or my implementation of recor= d reader is having some problem.

Can someone tell me the correct way of deserializing the output file of= mapper? or There is some problem with my code?
Here is the link to my = initial stab at RecordReader: https://dl.dropbox.com/u/6414= 9128/ImmutableBytesWritable_Put_RecordReader.java
--
Thanks & Regards,
Anil Gupta --047d7b62214c683dc204d48914bb--