hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Allen Wittenauer ...@yahoo-inc.com>
Subject Re: Hadoop HDFS + ZFS (RE: Some Doubts of hadoop functionality)
Date Mon, 07 Jan 2008 20:33:10 GMT
On 1/5/08 12:34 AM, "Greg Connor" <gconnor@shutterfly.com> wrote:
> Since you mentioned ZFS, I went and looked at it today, and it definitely is
> all kinds of cool.  ZFS is an excellent example of a robust, feature-rich
> filesystem, at least if it does what it's documentation claims it does.

    It does for the most part, but there are some key features still
missing.  The biggest one, IMHO, being disk evacuation.  [FWIW, I've been
using ZFS for a few years now.  Yes, I'm ex-Sun.]

    Also be aware that I'm fairly sure you can't grow a RAIDZ pool.

> So I'm thinking, we really need to get these two together... I think they
> would get along famously.

    It should be noted that Sun bought Cluster File Sytems, Inc., in order
to own Lustre.  The stated goal is to merge ZFS and Lustre.

> I would really *love* to see Hadoop pick up some of
> the same features, especially snapshot/clone and parity blocks.

    You can do an offline snapshot now, but having them be able to performed
while online would be awesome.  We've been pushing the devs for this
functionality for a while now as well.  [Although, I don't remember if there
is a JIRA for this.]

    One of the big questions we had recently after an outage is what kind of
performance hit would it be to have 3 or more copies of the fsimage and
edits file to maintain.  I've been thinking that having three copies
(instead of the two that we normally configure) would provide at least some
name node parity.

> I'm guessing 
> it won't do so in the near future, but hopefully some other product will come
> along soon that does for the distributed-storage world what ZFS does for the
> single machine.

    It's interesting to think what NFSv4.1/pNFS + ZFS would bring...

View raw message