hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Keith Wyss <keith.w...@explorys.com>
Subject Mixing Puts and Deletes in a single RPC
Date Thu, 05 Jul 2012 18:19:20 GMT

My organization has been doing something zany to simulate atomic row operations is HBase.

We have a converter-object model for the writables that are populated in an HBase table, and
one of the governing assumptions
is that if you are dealing with an Object record, you read all the columns that compose it
out of HBase or a different data source.

When we read lots of data in from a source system that we are trying to mirror with HBase,
if a column is null that means that whatever is
in HBase for that column is no longer valid. We  have simulated what I believe is now called
a AtomicRowMutation by using a single Put
and populating it with blanks. The downside is the wasted space accrued by the metadata for
the blank columns.

Atomicity is not of utmost importance to us, but performance is. My approach has been to create
a Put and Delete object for a record and populate the Delete with the null columns. Then we
call HTable.batch(List<Row>) on a bunch of these. It is my impression that this
shouldn't appreciably increase network traffic as the RPC calls will be bundled.

Has anyone else addressed this problem? Does this seem like a reasonable approach?
What sort of performance overhead should I expect?

Also, I've seen some Jira tickets about making this an atomic operation in its own right.
Is that something that
I can expect with CDH3U4?


Keith Wyss

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message