hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Keith Wyss <keith.w...@explorys.com>
Subject Re: Mixing Puts and Deletes in a single RPC
Date Thu, 05 Jul 2012 19:05:14 GMT
Thanks for the info Ted,

Anyone tackled this problem before 0.94?


On 7/5/12 2:28 PM, "Ted Yu" <yuzhihong@gmail.com> wrote:

>Take a look at HBASE-3584: Allow atomic put/delete in one call
>It is in 0.94, meaning it is not even in cdh4
>On Thu, Jul 5, 2012 at 11:19 AM, Keith Wyss <keith.wyss@explorys.com>
>> Hi,
>> My organization has been doing something zany to simulate atomic row
>> operations is HBase.
>> We have a converter-object model for the writables that are populated in
>> an HBase table, and one of the governing assumptions
>> is that if you are dealing with an Object record, you read all the
>> that compose it out of HBase or a different data source.
>> When we read lots of data in from a source system that we are trying to
>> mirror with HBase, if a column is null that means that whatever is
>> in HBase for that column is no longer valid. We  have simulated what I
>> believe is now called a AtomicRowMutation by using a single Put
>> and populating it with blanks. The downside is the wasted space accrued
>> the metadata for the blank columns.
>> Atomicity is not of utmost importance to us, but performance is. My
>> approach has been to create a Put and Delete object for a record and
>> populate the Delete with the null columns. Then we call
>> HTable.batch(List<Row>) on a bunch of these. It is my impression that
>> shouldn't appreciably increase network traffic as the RPC calls will be
>> bundled.
>> Has anyone else addressed this problem? Does this seem like a reasonable
>> approach?
>> What sort of performance overhead should I expect?
>> Also, I've seen some Jira tickets about making this an atomic operation
>> its own right. Is that something that
>> I can expect with CDH3U4?
>> Thanks,
>> Keith Wyss

View raw message