hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Gray <jg...@facebook.com>
Subject RE: Please help me overcome HBase's weaknesses
Date Sat, 04 Sep 2010 22:19:34 GMT
Answers inline.

> -----Original Message-----
> From: MauMau [mailto:maumau307@gmail.com]
> Sent: Saturday, September 04, 2010 9:31 AM
> To: user@hbase.apache.org
> Subject: Please help me overcome HBase's weaknesses
> 
> Hello,
> 
> We are considering which of HBase or Cassandra to choose for our future
> projects. I'm recommending HBase to my boss and coworkers, because
> HBase is
> good both for analysis (MapReduce) and for OLTP (get/put provides
> relatively
> fast response). Cassandra is superior in get/put response time, but it
> does
> not seem to be good at MapReduce because it can't perform range queries
> based on row keys (OPP can, but it seems difficult to use).
> 
> However, my boss points out the following as the weaknesses of HBase
> and
> insists that we choose Cassandra. I prefer HBase because HBase has
> stronger
> potential, thanks to its active community and rich ecosystem backed by
> the
> membership of Hadoop family. Are there any good explanations (or future
> improvement plans/ideas) to persuade him and change his mind?
> 
> (1) Ease of use
> Cassandra does not require any other software. All nodes of Cassandra
> have
> the same role. Pretty easy.
> On the other hand, HBase requires HDFS and ZooKeeper. Users have to
> manipulate and manage HDFS and ZooKeeper. The nodes in the cluster have
> various roles, and the users need to design the placement of different
> types
> of nodes.

There are certainly more components to HBase.

The flipside of this is that after some time working with HBase, you can have a solid understanding
of most elements of it and HDFS architecture.  Cassandra may work but generally users have
no idea how it works.  Magic of course.

Both are fairly immature projects, so problems/issues are almost a certainty if you are pushing
things.  Despite a higher learning curve, my preference would for a system I'm able to comprehend.

> (2) Failover time
> One of our potential customers requires that the system completes
> failover
> within one second. "One second" means the interval between when the
> system
> detects node failure and when the clients regain access to data.
> Cassandra continues to process requests if one of three replica nodes
> remains. Therefore, the requirement is met.
> However, HBase seems to take minutes, because it needs to reassign
> regions
> to live region servers, open reassigned regions and load their block
> index
> into memory, and perform log application. As the hardware gets more
> powerful, each node will be able to handle more regions. As a result,
> failover time will get longer in proportion to the number of regions,
> won't
> it?
> ## My question:
> Is it possible to improve failover time? If yes, how long will it get
> shortened?
> ##

If you have a strong requirement of not being able to have data unavailable for more than
one second, I think Cassandra would be a clear winner here.  Is this a requirement just for
reads, for writes, or both?

The flipside of this, and the way Cassandra can have such high availability, is eventual consistency.
 If you can deal with that and require always being able to write, then HBase recovery time
(as it is today) will simply not do.

We are continuing to make improvements but will never be at the level of Cassandra in this
regard because of our strong consistency guarantees.

There are plenty of ways people deal with data unavailability, primarily with caching layers
above HBase when used in a serving context.

> (3) SPOF
> Cassandra has no SPOF. HBase and future HDFS eliminates SPOF by using
> backup
> masters, however, master failure *can* the entire system operation in
> some
> way. How long does it take to detect master failure and make one of the
> backup masters promote to the new master and return to normal
> operation?

The master in HBase is not responsible for reads/writes.  So you can continue using a cluster
without a Master.  So there is not 

Once detected, master failover is extremely quick.  The primary issue is the detection.  The
faster you try to make the detection, the more likely you trigger false positives.  This is
a trade-off in most any HA/FT system.

> (4) Storage and analysis of sensor data
> If the row key is (sensor_id) or (sensor_id, timestamp), Cassandra can
> hash
> the row key and distribute inserts from many sensors to the entire
> cluster
> (no hotspot). Though MapReduce framework may throw commands to all
> nodes,
> the nodes that do not have related data will not do anything nor waste
> CPU
> or I/O resources.
> ## My question:
> Is there any case study where HBase is used as a storage for sensor
> data?
> ##

Not sure I fully understand your question.

HBase has ordered tables, so there are hotspots if a particular row, or sensor, has much more
load than others.  A given row will only be served by one node (today, read-only slave copies
have been discussed as a future feature).  Cassandra generally does not have this issue.

HBase will automatically split regions (shards) as they grow, so some shards may have lots
of small rows and some only a few large rows, so it takes care of balancing size.  In the
future we will be considering things like read/write load to also help with load balancing
and region splitting, but these do not exist today.

The flipside to this is that Cassandra carries the same in-memory data on every replica (because
data can be read/written from multiple nodes, it must live on all these nodes), whereas HBase
only carries it once on one server.  The replication in HBase is at the DFS level not the
DB level.  So across a cluster, you can effectively only have 1/3 of the total memory available
for unique data with Cassandra, if that makes sense.


In the end, both have strengths and weaknesses.  Quite honestly the requirement of not being
to have data unavailability for more than 1 second likely takes HBase out of the running because
under hard RegionServer failure, you will almost certainly have regions offline for longer
than that.  We'll continue improving here, and if you are not including the time for fault
detection, it is feasible that we could get down into the realm of 1 second, though in this
case you'd likely have a period of "eventual consistency" in which you would be able to access
a region while the log replay was going on.

Otherwise, to deal with better availability under faults, you'd have to rely on read-only
replicas in which case you could continue reading but not be able to write when the primary
RS is unavailable.

Good luck!

JG

Mime
View raw message