hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mohammad Tariq <donta...@gmail.com>
Subject Re: MapReduce to load data in HBase
Date Thu, 07 Feb 2013 11:28:28 GMT
Hello Panshul,

    My answers :
1- You can serialize the entire jSON into a byte[ ] and store it in a
cell.(Is it important for you extract individual values from your JSON and
then put them into the table?)
2- You can write your own datatype to pass your object to the reducer. But,
it must be a Writable+Comparable. Alternatively you van use Avro.
3- For generating unique keys, you can use MR counters.

Warm Regards,

On Thu, Feb 7, 2013 at 4:52 PM, Panshul Whisper <ouchwhisper@gmail.com>wrote:

> Hello,
> I am trying to write MapReduce jobs to read data from JSON files and load
> it into HBase tables.
> Please suggest me an efficient way to do it. I am trying to do it using
> Spring Data Hbase Template to make it thread safe and enable table locking.
> I use the Map methods to read and parse the JSON files. I use the Reduce
> methods to call the HBase Template and store the data into the HBase tables.
> My questions:
> 1. Is this the right approach or should I do all of the above the Map
> method?
> 2. How can I pass the Java Object I create holding the data read from the
> Json file to the Reduce method, which needs to be saved to the HBase table?
> I can only pass the inbuilt data types to the reduce method from my mapper.
> 3. I thought of using the distributed cache for the above problem, to
> store the object in the cache and pass only the key to the reduce method.
> But how do I generate the unique key for all the objects I store in the
> distributed cache.
> Please help me with the above. Please tell me if I am missing some detail
> or over looking some important detail.
> Thanking You,
> --
> Regards,
> Ouch Whisper
> 010101010101

View raw message