incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dave Viner <davevi...@gmail.com>
Subject Re: Facebook messaging and choice of HBase over Cassandra - what can we learn?
Date Mon, 22 Nov 2010 01:56:36 GMT
I don't know the details of operation of HBase, so I can't speak on that
point.  But, I do know that Facebook hired Jonathan Grey, former CTO of
Streamy, who is a huge HBase contributor. Streamy ended in Mar 2010 -
although I'm not sure when he went to work for Facebook.

He presented on HBase at the Hadoop conference in October in NYC:
http://mpouttuclarke.wordpress.com/2010/10/18/notes-from-hadoop-world-2010-nyc/

Again, I don't know the chronology (whether he was hired before the decision
to use hbase or after).  But I know that Jonathan is a fantastically smart
(and extremely nice) guy and I'm sure he could make HBase bend to his will
at any point.

Dave Viner

On Sun, Nov 21, 2010 at 4:16 PM, Todd Lipcon <todd@lipcon.org> wrote:

> On Sun, Nov 21, 2010 at 2:06 PM, Edward Ribeiro <edward.ribeiro@gmail.com>wrote:
>
>>
>> Also I believe saying HBASE is consistent is not true. This can happen:
>>> Write to region server. -> Region Server acknowledges client-> write
>>> to WAL -> region server fails = write lost
>>>
>>> I wonder how facebook will reconcile that. :)
>>>
>>
>> Are you sure about that? Client writes to WAL before ack user?
>>
>> According to these posts[1][2], "if writing the record to the WAL fails
>> the whole operation must be considered a failure.", so it would be nonsense
>> acknowledge clients before writing the lifeline. I hope any cloudera guy
>> explain this...
>>
>>
> [only jumping in because info was requested - those who know me know that I
> think Cassandra is a very interesting architecture and a better fit for many
> applications than HBase]
>
> You can operate the commit log in two different modes in HBase. One mode is
> "deferred log flush", where the region server appends but does not sync()
> the commit log to HDFS on every write, but rather on a periodic basis (eg
> once a second). This is similar to the innodb_flush_log_at_trx_commit=2
> option in MySQL for example. This has slightly better performance obviously
> since the writer doesn't need to wait on the commit, but as you noted
> there's a window where a write may be acknowledged but then lost. This is an
> issue of *durability* moreso than consistency.
>
> In the other mode of operation (default in recent versions of HBase) we do
> not acknowledge a write until it has been pushed to the OS buffer on the
> entire pipeline of log replicas. Obviously this is slower, but it results in
> "no lost data" regardless of any machine failures. Additionally, concurrent
> readers do not see written data until these same properties have been
> satisfied. So this mode is 100% consistent and 100% durable. In practice,
> this effects latency significantly since it adds two extra round trips to
> each write, but system throughput is only reduced by 20-30% since the
> commits are pipelined (see HDFS-895 for gory details)
>
> I believe Cassandra has similar tuning options about whether to sync every
> commit to the log or only do so periodically.
>
> If you're interested in learning more, feel free to reference this
> documentation:
> http://hbase.apache.org/docs/r0.89.20100726/acid-semantics.html
>
>
>
>> Besides that, you know that WAL is written to HDFS that takes care of
>> replication and fault tolerance, right? Of course, even so, there's a
>> "window of inconsistency" before the HLog is flushed to disk, but I don't
>> think you can dismiss this as not consistent. At most, you may classify it
>> as "eventual consistent". :)
>>
>> [1] http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html
>> [2]
>> http://www.larsgeorge.com/2010/01/hbase-architecture-101-write-ahead-log.html
>>
>> E. Ribeiro
>>
>>
>

Mime
View raw message