Return-Path: X-Original-To: apmail-hadoop-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 66F59E707 for ; Thu, 7 Feb 2013 11:42:05 +0000 (UTC) Received: (qmail 54225 invoked by uid 500); 7 Feb 2013 11:42:00 -0000 Delivered-To: apmail-hadoop-user-archive@hadoop.apache.org Received: (qmail 53969 invoked by uid 500); 7 Feb 2013 11:41:59 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 53949 invoked by uid 99); 7 Feb 2013 11:41:59 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 07 Feb 2013 11:41:59 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of dontariq@gmail.com designates 209.85.128.181 as permitted sender) Received: from [209.85.128.181] (HELO mail-ve0-f181.google.com) (209.85.128.181) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 07 Feb 2013 11:41:53 +0000 Received: by mail-ve0-f181.google.com with SMTP id d10so2158038vea.26 for ; Thu, 07 Feb 2013 03:41:32 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=x-received:mime-version:in-reply-to:references:from:date:message-id :subject:to:content-type; bh=s1CXD9WLnoRB7ZnchsF+nViXqdSnwdSS6DZ9DMXveBA=; b=DESkqIa3t09V7So9WaOZ2cuwXghzXKNNKCKgRSLm0vgEovaLFG0tX5yne12bVxpzgb /77HI0HxwZa3RZnvw8YxjtqZfEP7Sl08mFd0xSa3RhIO3UYUjst0oCOpdBDGFJFEeJIL TaCdJykCJFYry6qFJZUZTpzwps/ec8LkpEgSmjdWJPnLUVGoe9fB3tB5fQ2spvrWG4Eh mAXAVE4nBxT0zS26gKV9zt0+pnJVvq9YgQUTkxYTRBE1urhoZoN3Vuc8CN7/WyEhnsHW YJvQIV5f41fG77mJ5xbuxHn/4H+TEKSsombJ9BZVjU6wWMAjnfNU0wqt9pRJwg25xpkb y+lA== X-Received: by 10.52.89.100 with SMTP id bn4mr950071vdb.88.1360237292202; Thu, 07 Feb 2013 03:41:32 -0800 (PST) MIME-Version: 1.0 Received: by 10.58.34.16 with HTTP; Thu, 7 Feb 2013 03:40:52 -0800 (PST) In-Reply-To: References: From: Mohammad Tariq Date: Thu, 7 Feb 2013 17:10:52 +0530 Message-ID: Subject: Re: MapReduce to load data in HBase To: "user@hadoop.apache.org" Content-Type: multipart/alternative; boundary=20cf307d06509cab6e04d520ec90 X-Virus-Checked: Checked by ClamAV on apache.org --20cf307d06509cab6e04d520ec90 Content-Type: text/plain; charset=ISO-8859-1 You might find these links helpful : http://stackoverflow.com/questions/10961474/how-in-hadoop-is-the-data-put-into-map-and-reduce-functions-in-correct-types/10965026#10965026 http://stackoverflow.com/questions/13877077/how-do-i-set-an-object-as-the-value-for-map-output-in-hadoop-mapreduce/13877688#13877688 HTH Warm Regards, Tariq https://mtariq.jux.com/ cloudfront.blogspot.com On Thu, Feb 7, 2013 at 5:05 PM, Panshul Whisper wrote: > Hello, > > Thank you for the reply. > 1. I cannot serialize the Json and store it as a whole. I need to extract > individual values and store them as later I need to query the stored values > in various aggregation algorithms. > 2. Can u please point me in direction where I can find out how to write a > data type to be Writable+Comparable. I will look into Avro, but I prefer to > write my owm data type. > 3. I will look into MR counters. > > Regards, > > > On Thu, Feb 7, 2013 at 12:28 PM, Mohammad Tariq wrote: > >> Hello Panshul, >> >> My answers : >> 1- You can serialize the entire jSON into a byte[ ] and store it in a >> cell.(Is it important for you extract individual values from your JSON and >> then put them into the table?) >> 2- You can write your own datatype to pass your object to the reducer. >> But, it must be a Writable+Comparable. Alternatively you van use Avro. >> 3- For generating unique keys, you can use MR counters. >> >> Warm Regards, >> Tariq >> https://mtariq.jux.com/ >> cloudfront.blogspot.com >> >> >> On Thu, Feb 7, 2013 at 4:52 PM, Panshul Whisper wrote: >> >>> Hello, >>> >>> I am trying to write MapReduce jobs to read data from JSON files and >>> load it into HBase tables. >>> Please suggest me an efficient way to do it. I am trying to do it using >>> Spring Data Hbase Template to make it thread safe and enable table locking. >>> >>> I use the Map methods to read and parse the JSON files. I use the Reduce >>> methods to call the HBase Template and store the data into the HBase tables. >>> >>> My questions: >>> 1. Is this the right approach or should I do all of the above the Map >>> method? >>> 2. How can I pass the Java Object I create holding the data read from >>> the Json file to the Reduce method, which needs to be saved to the HBase >>> table? I can only pass the inbuilt data types to the reduce method from my >>> mapper. >>> 3. I thought of using the distributed cache for the above problem, to >>> store the object in the cache and pass only the key to the reduce method. >>> But how do I generate the unique key for all the objects I store in the >>> distributed cache. >>> >>> Please help me with the above. Please tell me if I am missing some >>> detail or over looking some important detail. >>> >>> Thanking You, >>> >>> >>> -- >>> Regards, >>> Ouch Whisper >>> 010101010101 >>> >> >> > > > -- > Regards, > Ouch Whisper > 010101010101 > --20cf307d06509cab6e04d520ec90 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
<= br clear=3D"all">
Warm Regards,
Tariq
https://mtariq.jux.com/=


On Thu, Feb 7, 2013 at 5:05 PM, Panshul = Whisper <ouchwhisper@gmail.com> wrote:
Hello,

Thank you for the reply.
1. I cannot serialize the Json and store it as a whole. I need to extrac= t individual values and store them as later I need to query the stored valu= es in various aggregation algorithms.
2. Can u please point me in direction where I can find out how to writ= e a data type to be Writable+Comparable. I will look into Avro, but I prefe= r to write my owm data type.
3. I will look into MR counters.

Regards,


On Thu, = Feb 7, 2013 at 12:28 PM, Mohammad Tariq <dontariq@gmail.com> wrote:
Hello Panshul,

=A0 =A0 My answers :
1- You can serialize the entire jSON = into a byte[ ] and store it in a cell.(Is it important for you extract indi= vidual values from your JSON and then put them into the table?)
2- You can write your own datatype to pass your object to the reducer.= But, it must be a Writable+Comparable. Alternatively you van use Avro.
3- For generating unique keys, you can use MR counters.



On Thu, Feb 7, 2013 at 4:52 PM, Panshul = Whisper <ouchwhisper@gmail.com> wrote:
Hello,

I am trying to write MapReduce j= obs to read data from JSON files and load it into HBase tables.
P= lease suggest me an efficient way to do it. I am trying to do it using Spri= ng Data Hbase Template to make it thread safe and enable table locking.

I use the Map methods to read and parse the JSON files.= I use the Reduce methods to call the HBase Template and store the data int= o the HBase tables.

My questions:
1. Is = this the right approach or should I do all of the above the Map method?
2. How can I pass the Java Object I create holding the data read from = the Json file to the Reduce method, which needs to be saved to the HBase ta= ble? I can only pass the inbuilt data types to the reduce method from my ma= pper.=A0
3. I thought of using the distributed cache for the above problem, to = store the object in the cache and pass only the key to the reduce method. B= ut how do I generate the unique key for all the objects I store in the dist= ributed cache.

Please help me with the above. Please tell me if I am m= issing some detail or over looking some important detail.

Thanking You,


--
Regards,
Ouch Whisper
01010= 1010101




--
Regards,
Ouch Whisper
010101010101

--20cf307d06509cab6e04d520ec90--