cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Todd Lipcon <t...@lipcon.org>
Subject Re: Facebook messaging and choice of HBase over Cassandra - what can we learn?
Date Mon, 22 Nov 2010 00:16:16 GMT
On Sun, Nov 21, 2010 at 2:06 PM, Edward Ribeiro <edward.ribeiro@gmail.com>wrote:

>
> Also I believe saying HBASE is consistent is not true. This can happen:
>> Write to region server. -> Region Server acknowledges client-> write
>> to WAL -> region server fails = write lost
>>
>> I wonder how facebook will reconcile that. :)
>>
>
> Are you sure about that? Client writes to WAL before ack user?
>
> According to these posts[1][2], "if writing the record to the WAL fails the
> whole operation must be considered a failure.", so it would be nonsense
> acknowledge clients before writing the lifeline. I hope any cloudera guy
> explain this...
>
>
[only jumping in because info was requested - those who know me know that I
think Cassandra is a very interesting architecture and a better fit for many
applications than HBase]

You can operate the commit log in two different modes in HBase. One mode is
"deferred log flush", where the region server appends but does not sync()
the commit log to HDFS on every write, but rather on a periodic basis (eg
once a second). This is similar to the innodb_flush_log_at_trx_commit=2
option in MySQL for example. This has slightly better performance obviously
since the writer doesn't need to wait on the commit, but as you noted
there's a window where a write may be acknowledged but then lost. This is an
issue of *durability* moreso than consistency.

In the other mode of operation (default in recent versions of HBase) we do
not acknowledge a write until it has been pushed to the OS buffer on the
entire pipeline of log replicas. Obviously this is slower, but it results in
"no lost data" regardless of any machine failures. Additionally, concurrent
readers do not see written data until these same properties have been
satisfied. So this mode is 100% consistent and 100% durable. In practice,
this effects latency significantly since it adds two extra round trips to
each write, but system throughput is only reduced by 20-30% since the
commits are pipelined (see HDFS-895 for gory details)

I believe Cassandra has similar tuning options about whether to sync every
commit to the log or only do so periodically.

If you're interested in learning more, feel free to reference this
documentation:
http://hbase.apache.org/docs/r0.89.20100726/acid-semantics.html



> Besides that, you know that WAL is written to HDFS that takes care of
> replication and fault tolerance, right? Of course, even so, there's a
> "window of inconsistency" before the HLog is flushed to disk, but I don't
> think you can dismiss this as not consistent. At most, you may classify it
> as "eventual consistent". :)
>
> [1] http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html
> [2]
> http://www.larsgeorge.com/2010/01/hbase-architecture-101-write-ahead-log.html
>
> E. Ribeiro
>
>

Mime
View raw message