hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From lars hofhansl <lhofha...@yahoo.com>
Subject Re: sync on writes
Date Wed, 01 Aug 2012 16:29:59 GMT
"sync" is a fluffy term in HDFS. HDFS has hsync and hflush.
hflush forces all current changes at a DFSClient to all replica nodes (but not to disk).

Until HDFS-744 hsync would be identical to hflush. After HDFS-744 hsync can be used to force
data to disk at the replicas.

When HBase refers to "sync" the hflush semantics are meant (at least until HBASE-5954 is finished).
I.e. a sync here ensures that the replica nodes have seen the changes, which is what you want.

So when you say "since another copy is always there on the replica nodes", that is only guaranteed
after an hflush (again, which HBase calls sync).

I've also written about this here: http://hadoop-hbase.blogspot.com/2012/05/hbase-hdfs-and-durable-sync.html

-- Lars

 From: Mohit Anchlia <mohitanchlia@gmail.com>
To: user@hbase.apache.org 
Sent: Tuesday, July 31, 2012 6:09 PM
Subject: sync on writes
In the HBase book it mentioned that the default behaviour of write is to
call sync on each node before sending replica copies to the nodes in the
pipeline. Is there a reason this was kept default because if data is
getting written on multiple nodes then likelyhood of losing data is really
low since another copy is always there on the replica nodes. Is it ok to
make this sync async and is it advisable?
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message