hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Panshul Whisper <ouchwhis...@gmail.com>
Subject MapReduce to load data in HBase
Date Thu, 07 Feb 2013 11:22:01 GMT

I am trying to write MapReduce jobs to read data from JSON files and load
it into HBase tables.
Please suggest me an efficient way to do it. I am trying to do it using
Spring Data Hbase Template to make it thread safe and enable table locking.

I use the Map methods to read and parse the JSON files. I use the Reduce
methods to call the HBase Template and store the data into the HBase tables.

My questions:
1. Is this the right approach or should I do all of the above the Map
2. How can I pass the Java Object I create holding the data read from the
Json file to the Reduce method, which needs to be saved to the HBase table?
I can only pass the inbuilt data types to the reduce method from my mapper.
3. I thought of using the distributed cache for the above problem, to store
the object in the cache and pass only the key to the reduce method. But how
do I generate the unique key for all the objects I store in the distributed

Please help me with the above. Please tell me if I am missing some detail
or over looking some important detail.

Thanking You,

Ouch Whisper

View raw message