hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kannan Muthukkaruppan <Kan...@facebook.com>
Subject RE: commit semantics
Date Tue, 12 Jan 2010 19:29:46 GMT
Dhruba & I just talked off-line about this as well. Yes, writing to two clusters would
result in unnecessary complexity... we will essentially need to deal with inconsistencies
between the two clusters at the application level.

For data integrity, going with group commits (batch commits) seems like a good option. My
understanding of group commits as implemented in 0.21 is as follows:

*         We wait on acknowledging back to the client until the transaction has been synced
to HDFS.

*         Syncs are batched-a sync is called if the queue has enough transactions  or if a
timer expires. (I would imagine that both the # of transactions to batch up as well as timer
are configurable knobs already)? In this mode, for the client, the latency increase on writes
is upper bounded by the timer setting + the cost of sync itself.

From: saint.ack@gmail.com [mailto:saint.ack@gmail.com] On Behalf Of stack
Sent: Tuesday, January 12, 2010 10:52 AM
To: hbase-dev@hadoop.apache.org
Cc: Kannan Muthukkaruppan; Dhruba Borthakur
Subject: Re: commit semantics

On Tue, Jan 12, 2010 at 10:14 AM, Dhruba Borthakur <dhruba@gmail.com<mailto:dhruba@gmail.com>>
Hi stack,

I was meaning "what if the application inserted the same record into two
Hbase instances"? Of course, now the onus is on the appl to keep both of
them in sync and recover from any inconsistencies between them.

Ok.  Like your  "Overlapping Clusters for HA" from http://www.borthakur.com/ftp/hdfs_high_availability.pdf?

I'm not sure how the application could return after writing one cluster without waiting on
the second to complete as you suggest above.  It could write in parallel but the second thread
might not complete for myriad reasons.  What then?  And as you say, reading, the client would
have to make reconciliation.

Isn't there already a 'scalable database' that gives you this headache for free without your
having to do work on your part (smile)?

Do you think there a problem syncing on every write (with some batching of writes happening
when high-concurrency) or, if that too slow for your needs, adding the holding of clients
until sync happens as joydeep suggests?  Will that be sufficient data integrity-wise?



  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message