hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chad Walters <Chad.Walt...@microsoft.com>
Subject Re: ZK rethink?
Date Mon, 13 Apr 2009 02:46:30 GMT

I share some of your disappointment with HDFS and its narrow focus Please feel free to investigate
KFS and report back. I am interested to know the results -- just not interested enough at
this time to divert our resources towards something exploratory when there are clear and important
improvements to HBase itself directly in front of us.


On 4/12/09 1:00 PM, "Ryan Rawson" <ryanobjc@gmail.com> wrote:

I've been in communication with Rao, and he suggests to use the SVN tip,
they are using that for production.  Also patching Hadoop's layer to
overcome those bugs as well.  I'm going to try it again soon using the new
recommendations.  I'll let you know.  One possibility if we are seeing RAM
issues is to link KFS with tcmalloc.

While in theory HDFS has more people banging on it, I've become a little
disillusioned with HDFS as a product itself - it seems like if a feature is
not required for map-reduce, it isn't implemented.  While in theory we can
go and fix these bugs with HDFS (including a complete rewrite of datanode to
excise threads), I was thinking, why not use something that has an
established codebase and developer working on big problems.

On Sun, Apr 12, 2009 at 12:44 PM, Andrew Purtell <apurtell@apache.org>wrote:

> I ran an initial test of HBase on top of KFS -- using the
> Hadoop FS abstraction layer, which is the only current
> integration option -- and ran into some trouble under load.
> I tested with kfs-0.2.3 on 0.19.1 plus the patch for
> HADOOP-5292. It may have just been a function of the scale
> of the deployment (all localhost :-) ) but the chunkserver
> after a time began to expand its address space by gigs,
> beyond 40 GB of address space in one instance before I
> killed it. I may play around with it again on a testbed of
> several nodes at some point.
>   - Andy
> > From: Chad Walters
> > Subject: Re: ZK rethink?
> > Date: Sunday, April 12, 2009, 7:43 AM
> > My understanding is that Quantcast was running both HDFS and
> > KFS in parallel since they didn't  fully trust either
> > one. Can anyone confirm or deny this? Have they switched
> > over fully to using KFS?
> >
> > KFS seemed interesting but, given that more development
> > effort is directed at HDFS, it didn't seem worth
> > pursuing. Of course, as you point out, HDFS has been
> > slow/resistant to implementing some features important for
> > HBase that KFS supports out of the box.
> >
> > As things currently stand, it is unlikely that folks from
> > the Powerset team will lead any investigation into KFS.
> > However, others are welcome to spend some time digging into
> > it and seeing how promising the prospect is.
> >
> > I believe that KFS has an implementation of the Hadoop File
> > System interface. It seems like someone who was interested
> > and motivated could hook up KFS under HBase that way to do
> > some basic testing. Perhaps some benchmarking with KFS would
> > turn up some numbers that could be used to spur some changes
> > at HDFS...
> >
> > Doing deeper integration with KFS that avoided the Hadoop
> > FileSystem abstraction seems like it would be a mistake (but
> > I could be convinced otherwise). Proposing possible
> > extensions to the Hadoop FileSystem interface might be more
> > fruitful.
> >
> > Chad

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message