hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Purtell <apurt...@apache.org>
Subject Re: Using HBase on other file systems
Date Sun, 09 May 2010 16:44:42 GMT
Our experience with Gluster 2 is that self heal when a brick drops off the network is very
painful. The high performance impact lasts for a long time. I'm not sure but I think Gluster
3 may only rereplicate missing sections instead of entire files. On the other hand I would
not trust Gluster 3 to be stable (yet).

I've also tried KFS. My experience seem to bear out other observations that it is ~30% slower
that HDFS. Also I was unable to keep the chunkservers up on my CentOS 5 based 64 bit systems.
I give Sriram shell access so he could poke around coredumps with gdb but there was no satisfactory

Another team at Trend is looking at Ceph. I think it is a highly promising filesystem but
at the moment it is an experimental filesystem undergoing a high rate of development that
requires another experimental filesystem undergoing a high rate of development (btrfs) for
recovery semantics, and the web site warns "NOT SAFE YET" or similar. I doubt it has ever
been tested on clusters > 100 nodes. In contrast, HDFS has been running in production on
clusters with 1000s of nodes for a long time. 

There currently is not a credible competitor to HDFS in my opinion. Ceph is definitely worth
keeping an eye on however. I wonder if HDFS will evolve to offer a similar scalable metadata
service (NameNode) to compete. Certainly that would improve its scalability and availability
story, both issues today presenting barriers to adoption, and barriers for anything layered
on top, like HBase. 

   - Andy

> From: Kevin Apte 
> Subject: Using HBase on other file systems
> To: hbase-user@hadoop.apache.org
> Date: Sunday, May 9, 2010, 5:08 AM
> I am wondering if anyone has thought
> about using HBase on other file systems like "Gluster".  I
> think Gluster may offer much faster performance without
> exorbitant cost.  With Gluster, you would have to
> fetch the data from the "Storage Bricks" and process it in
> your own environment. This allows the
> servers that are used as storage nodes very cheap.


View raw message