hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ma, Ming" <min...@ebay.com>
Subject consistence of Increment and CheckAndPut
Date Thu, 09 Jun 2011 23:36:31 GMT
I looked at the implementation of Increment and CheckAndPut. There could be consistence issue.
Maybe that is by design - for HBase application scenarios it is good enough. Just want to
confirm with folks if that is the intention.

1.      Increment:
a.      Scenario. Increment call in client application will trigger an RPC call to region
server with the increment value along with cell information. After region server increments
the value successfully, it will try to return the value to the client application and at this
point RPC fail to reach client due to network issue. So the client thinks the operation failed,
but the server actually successfully increment the value. Now the client will try again and
cause the value to be incremented again on the server side.
b.      For certain application scenarios, it isn't much of an issue. for example, a), get
unique id. b) large volume of analytics data like "query hit count" can tolerate some inconsistence.
 given the chance is quite low for this scenario to happen.

2.      Same for CheckAndPut.
a.      Scenario. The same as above, due to network failure the client and the server have
different views whether the operation succeeds or not.
b.      The special case of "create new row when it doesn't exist" will work fine - if the
CheckAndPut fails, the client will always try to go back and Get the value.

From: Ma, Ming
Sent: Thursday, June 09, 2011 12:22 AM
To: 'user@hbase.apache.org'
Subject: RE: Does Put support "don't put if row exists"?

It looks like there is a HBase API called checkAndPut. By setting the value to be "null",
you can achieve "put only when the row+column family+column qualifier doesn't exist". Nice

From: Ma, Ming
Sent: Wednesday, June 08, 2011 9:54 PM
To: user@hbase.apache.org
Subject: Does Put support "don't put if row exists"?


Maybe this has been asked before. I couldn't find much information on this.

We have an application where multiple instances across different machines could try to insert
 a new row with the same row key into a global HBase table at the same time. If the row has
been inserted by one instance, we don't want other instances insert it again; instead the
other instances should try to Get the row after their Put fails with "already exists" error.

It is somewhat similar to https://issues.apache.org/jira/browse/HBASE-493 , but here we need
HBase to check for row existence, compared to check for version/timestamp.

The insertion rate is low, say 100 requests / sec. One way to implement this is to do it outside
HBase. We can have client application use zookeeper to create a lock named after row key.
The program will look like this:

If (!Row.Get())

// let us do checking again in case another instance has just inserted the same row
If (!Row.Get())
    // the row doesn't exist

Any suggestions?



  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message