hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Narayanan K <knarayana...@gmail.com>
Subject Aggregation while Bulk Loading into HBase
Date Wed, 28 Nov 2012 13:37:44 GMT
Hi all,

I have a scenario where I need to do aggregation while bulk loading into
HBase.

Say for example, I have the following rows in my flat file, each with 2
fields  - product-id, amount. Values as below :

P1, 1000
P2, 200
P3, 2500
P1,1500
P2, 300

My rowkey is product-id and I have a column : details:amount=<val>

What I want is, after the bulk load of the above file, the table must have
the following rows and column values :

P1 -- details:amount=2500
P2 -- details:amount=500
P3 -- details:amount=2500

My understanding of Bulk Load is that, when the map function gets a row
from the file, it can do some transformation, prepare the rowkey, columns
and then write to the HBase Table.

But in our case, we will need an instance of the HTable in the Mapper, do a
GET operation and find the rowkey if it already exists and then add up the
column amounts and then write back.
But in that case, all parallel mappers will open connection to the same
table and the GET will not be synchronized - leading to race conditions,
right ?

Is this the right way to do? If not, what are the other ways by which this
can be achieved?

Thanks in advance,
Narayanan K

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message