hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Purtell <apurt...@apache.org>
Subject Re: Does HBase do in-memory replication of rows?
Date Sun, 09 May 2010 17:18:54 GMT
Others have followed up on the central question, which is about durability, and have pointed
out that the text is misleading.

However more generally regarding the question "Does HBase do in-memory replication of rows?":

HBase will have a replication feature in the next release independent of HDFS layer data block
replication:

  HBASE-1295: https://issues.apache.org/jira/browse/HBASE-1295

This is cluster-to-cluster replication, at the HBase layer, and at a finer granularity than
the row.

HBase may also in the future evolve an optional extension to the BigTable architecture:

  HBASE-2357: https://issues.apache.org/jira/browse/HBASE-2357

and this I think also meets the definition of in-memory replication. While HBASE-2357 talks
about availability, I see this as a means for offering higher read scalability for some use
cases that can accept a relaxation of HBase's ACID guarantees.

So an answer to "Does HBase do in-memory replication of rows?" is also in part: Actually we
might do that, independent of providing durability by other means.

   - Andy

> From: MauMau
> Subject: Does HBase do in-memory replication of rows?
> To: hbase-user@hadoop.apache.org
> Date: Saturday, May 8, 2010, 5:16 AM
> Hello,
> 
> I'm comparing HBase and Cassandra, which I think are the
> most promising distributed key-value stores, to determine
> which one to choose for the future OLTP and data analysis.
> I found the following benchmark report by Yahoo! Research
> which evalutes HBase, Cassandra, PNUTS, and sharded MySQL.
> 
> http://www.brianfrankcooper.net/pubs/ycsb-v4.pdf
> http://www.brianfrankcooper.net/pubs/ycsb.pdf
> 
> The above report refers to HBase 0.20.3.
> Reading this and HBase's documentation, two questions about
> load balancing and replication have risen. Could anyone give
> me any information to help solve these questions?
> 
> [Q2] replication
> Does HBase perform in-memory replication of rows like
> Cassandra?
> Does HBase sync updates to disk before returing success to
> clients?
> 
> According to the following paragraph in HBase design
> overview, HBase syncs writes.
> 
> ----------------------------------------
> Write Requests
> When a write request is received, it is first written to a
> write-ahead log called a HLog. All write requests for every
> region the region server is serving are written to the same
> HLog. Once the request has been written to the HLog, the
> result of changes is stored in an in-memory cache called the
> Memcache. There is one Memcache for each Store.
> ----------------------------------------
> 
> The source code of Put class appear to show the above
> (though I don't understand the server-side code yet):
> 
>  private boolean writeToWAL = true;
> 
> However, Yahoo's report writes as follows. Is this
> incorrect? What is in-memory replication? I know HBase
> relies on HDFS to replicate data on the storage, but not in
> memory.
> 
> ----------------------------------------
> For Cassandra, sharded MySQL and PNUTS, all updates were
> synched to disk before returning to the client. HBase does
> not sync to disk, but relies on in-memory replication
> across
> multiple servers for durability; this increases write
> throughput
> and reduces latency, but can result in data loss on
> failure.
> ----------------------------------------
> 
> Maumau
> 
> 


      


Mime
View raw message