hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Maysam Yabandeh <may...@yahoo-inc.com>
Subject Re: Hbase Transactional support
Date Tue, 20 Mar 2012 09:23:24 GMT

It is up to  you to decide the granularity of transactions, whether at the reduce level or
at the M/R job level. Your application just need to be able to (efficiently) rerun the transaction
in case of an abort. The transaction feature provides your application with two good properties:
(i) ignoring the partial changes made by failed clients, (ii) providing isolation between
concurrent transactions. I guess what you need from transactions in a M/R job is the former.
This is because when the write set of transactions are large, the probability of write-write
conflict between two transactions goes high and it becomes hard to get some progress with
so many aborts. If you are planning to run long transactions (with large write sets) in parallel,
avoiding write-write conflicts should be taken care of at the application layer by having
the concurrent transactions to write to different data elements. In this case, Omid could
also be optimized by disabling the submission of ids of the write set to the status oracle.

- Maysam Yabandeh

On Mar 20, 2012, at 2:13 AM, Deepika Khera wrote:

Thanks Maysam. I am trying out Omid to see if it will fit my needs.

As I told you I am writing to hbase from a map reduce jobs. If my commit
and rollback is around a reducer task then it will be quite straight
forward. But if the commit should happen if all tasks of the M/R job
succeed(which is what I would want, because if some reducer tasks
succeed and some fail, it will not be possible to rerun partial data),
it gets tricky.
Am I on the wrong track?


On Mon, 2012-03-19 at 11:44 -0700, Maysam Yabandeh wrote:
Hi Deepika,

Omid provides Snapshot Isolation (SI), which is a well-known isolation guarantee in database
systems such as Oracle. In short, each transaction reads from a consistent snapshot that does
not include partial changes by concurrent (or failed) transactions. SI also prevents write-write
conflicts between concurrent transactions. The overhead of Omid on HBase is negligible and
does not require any changes into HBase, with the only exception of HBase garbage collection
algorithm that is replaced via a coprocessor. hbase-trx, on the other hand, does not provide
read snapshots and is not safe with client failures. You can find a more detailed comparison
in the Omid wiki page:

- Maysam Yabandeh

On Mar 19, 2012, at 6:49 PM, Deepika Khera wrote:


I have some map reduce jobs that write to Hbase. I am trying to pick a
library that could provide transactional support for Hbase. I looked at
Omid and hbase-trx .

Could you please provide me with a comparison between the two so I can
make the right choice.
Are there any other ways to do this?


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message