Return-Path: Delivered-To: apmail-hadoop-hbase-user-archive@locus.apache.org Received: (qmail 45878 invoked from network); 27 May 2008 19:58:11 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 27 May 2008 19:58:11 -0000 Received: (qmail 69381 invoked by uid 500); 27 May 2008 19:58:12 -0000 Delivered-To: apmail-hadoop-hbase-user-archive@hadoop.apache.org Received: (qmail 69369 invoked by uid 500); 27 May 2008 19:58:12 -0000 Mailing-List: contact hbase-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hbase-user@hadoop.apache.org Delivered-To: mailing list hbase-user@hadoop.apache.org Received: (qmail 69358 invoked by uid 99); 27 May 2008 19:58:12 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 27 May 2008 12:58:12 -0700 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of bryan@rapleaf.com designates 208.96.16.213 as permitted sender) Received: from [208.96.16.213] (HELO mail.rapleaf.com) (208.96.16.213) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 27 May 2008 19:57:15 +0000 Received: from mail.rapleaf.com (localhost.localdomain [127.0.0.1]) by mail.rapleaf.com (Postfix) with ESMTP id 133D012502E0 for ; Tue, 27 May 2008 12:57:36 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=rapleaf.com; q=dns; s=m1; b=gT5dz dgRjbjg0wZN86xAWRASvdbS4V2C2daLv4O/txfgts733B/Eyo0UtGrcOdxgrhykP 8ow/3zJE7YEzSpsQPbMMcgWOlo4MXVwPv7N4sdHpPxkMrSLRsaajTSo1mQILECNO jOi/vLTWewzaApYMajmItes+FxznZ7BxgbEFvs= Received: from [10.100.18.119] (unknown [10.100.18.119]) by mail.rapleaf.com (Postfix) with ESMTP id E7BE012500A5 for ; Tue, 27 May 2008 12:57:35 -0700 (PDT) Mime-Version: 1.0 (Apple Message framework v753) In-Reply-To: References: <30179AAC-403B-4839-AFA0-42892A236A75@wensel.net> <48223875.4090205@duboce.net> <48226A02.2050005@mails.tsinghua.edu.cn> <48269798.9010505@mails.tsinghua.edu.cn> <6EB716E3-6AF1-4DDF-A3BE-4C9FABDD69CE@rapleaf.com> Content-Type: text/plain; charset=ISO-2022-JP; delsp=yes; format=flowed Message-Id: Content-Transfer-Encoding: 7bit From: Bryan Duxbury Subject: Re: Does HBase support single-row transaction? Date: Tue, 27 May 2008 12:57:38 -0700 To: hbase-user@hadoop.apache.org X-Mailer: Apple Mail (2.753) X-Virus-Checked: Checked by ClamAV on apache.org It seems like if you wanted to do some manner of multi-row transactional put, the only real way to manage it is with deletes. That is, if the first put succeeds but the second fails, you can "invert" the first put into a bunch of deletes. Trying to make the regions themselves maintain the transactional state seems like a terrible idea. You'd have to not allow a region to get migrated to another server if it's serving a transaction. This would introduce a lot of potential performance problems, I think. Can you help me understand why atomic transactions are needed? Can't the atomicity problems be sort of resolved by the whole row versioning thing? Other databases that do transactions and rollbacks use versioning to accomplish that, I think. -Bryan On May 27, 2008, at 12:29 PM, Clint Morgan wrote: > Zookeeper makes good sense for distributed locking to get isolation. > But we still need transaction start, commit, and rollback to get > atomicity. I think this properly belongs in hbase. > > So suppose I want to read two rows, and then update them as an > isolated, atomic action: > > try { > getZookeeperLock(table) > tranId = table.beginTransaction(); > row1 = table.get() // Normal get, but isolated due to distributed > lock > row2 = table.get() > BatchUpdate b1 = new BatchUpdate(row1) > b1.put(...) > table.addUpdate(tranId, b1); > BatchUpdate b2 = new BatchUpdate(row2) > b2.put(...); > table.addUpdate(tranId, b2); > table.commit(tranId); > } catch(Exception e) { > table.rollback(tranId); > } finally { > releaseZookeeperLock(table) > } > > So then on the hbase side we hold on to the batchUpdates until the > table.commit is called. Then we roll through and apply the updates. > > I'm sure rollback()/commit() is tricky to implement, as the updates > could be on different region servers, so we need a failure on one to > trigger a rollback on others. We could use timestamp/old versions to > implement rollback on batchUpdates we have already applied. > > Alternatively, this may all be implemented above hbase. The client > keeps track of updates, and trys to roll back using timestamps. > Problem here is if the client dies midway through we have half the > transaction committed and loose atomicity/consistency. > > We will eventually want/need atomic transactions on hbase, so I'll > look into this further. Any input would be appreciated. Would be > interesting to know how/what google provides... > > cheers, > -clint > > > On Sun, May 11, 2008 at 7:48 AM, Bryan Duxbury > wrote: >> Currently, it's not on our list of things to do. There are a >> number of >> reasons why it would be better to use Zookeeper here than to try >> and build >> it into HBase. >> >> That said, I think you could get everything you need if you tried >> Zookeeper, >> using that to acquire locks on the row you need a transaction on. >> It's >> supposedly very high performance and supports your use case >> precisely. >> >> -Bryan >> >> On May 10, 2008, at 11:52 PM, Zhou Wei wrote: >> >>> Bryan Duxbury 写道: >>>> >>>> startUpdate is deprecated in TRUNK. Also, it doesn't do what you >>>> are >>>> thinking it does. Committing a BatchUpdate is atomic across the >>>> whole row, >>>> however. There is currently no way to make a get and a commit >>>> transactional, >>>> though there is an issue open for write-if-not-modified-since >>>> support. If >>>> this is something you need we can talk about how it might be >>>> supported. >>> >>> Thanks for answering my questions. >>> >>> So currently HBase is not suitable for transactional web >>> applications. >>> A simple counting transaction can not work by concurrent accesses: >>> transaction{ >>> get(x); >>> x++; >>> write(x); >>> } >>> >>> In my opinion, "write-if-not-modified-since" support may not be >>> the best >>> idea of implement single-row transaction. >>> Because if write can not be performed, application has to try >>> again and >>> again, or just return error and leave user to choose again or abort. >>> Probably locking, waiting and scheduling at region server might be >>> preferable in this case. >>> Is the single-row transaction feature currently in the roadmap of >>> HBase? >>> >>> Zhou >>>> >>>> -Bryan >>>> >>>> On May 7, 2008, at 7:48 PM, Zhou Wei wrote: >>>> >>>>> Hi >>>>> Does HBase support single-row transaction as described in Bigtable >>>>> paper? >>>>> >>>>> "Bigtable supports single-row transactions, which can be >>>>> used to perform atomic read-modify-write sequences on >>>>> data stored under a single row key." --Bigtable paper >>>>> >>>>> If so, how can I define a transaction in HBase, >>>>> is it looks like this: >>>>> >>>>> lid=startUpdate >>>>> get(lid) >>>>> .. >>>>> put(lid) >>>>> ... >>>>> commit(lid) >>>>> >>>>> Are these transactions isolated with each other? >>>>> If not, is there a way to achieve that? >>>>> >>>>> Thanks >>>>> >>>>> Zhou >>>> >>>> >>>> >>> >> >>