hadoop-hdfs-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Todd Lipcon <t...@cloudera.com>
Subject Re: [VOTE -- Round 2] Commit hdfs-630 to 0.21?
Date Fri, 22 Jan 2010 18:25:07 GMT
Hi Steve,

All of the below may be good ideas, but I don't think they're relevant to
the discussion at hand. Specifically, none of them can enter 0.21 without a
vote as they'd be new features, and it doesn't even sound like there's a
JIRA out for them yet. Let's not put off a well-known improvement patch
waiting for one that doesn't even exist yet. If we want to get the ideas
below into 22 or a later version, let's open a JIRA and discuss there rather
than using this vote thread.

As for the patch, I'm +1. It certainly is a large improvement on small
clusters - without it, in a three node cluster, you cannot successfully kill
a DN while doing an fs -put, even if your min.replication is 1. As Ryan
mentioned above, this is a huge problem since new users may evaluate Hadoop
on a 3-node cluster, figure "hey, let's see fault tolerance in action" and
then be entirely put off when their kill -9 takes the cluster to a
screeching halt.

Thanks
-Todd

On Fri, Jan 22, 2010 at 7:32 AM, Steve Loughran <stevel@apache.org> wrote:

> Stack wrote:
>
> I'm being 0 on this
>
> -I would worry if the exclusion list was used by the NN to do its
> blacklisting, I'm glad to see this isn't happening. Yes, you could pick up
> datanode failure faster, but you would also be vulnerable to a user doing a
> DoS against the cluster by reporting every DN as failing
>
> -Russ Perry's work on high-speed Hadoop rendering [1] tweaked Hadoop to
> allow the datanodes to get the entire list of nodes holding the data, and
> allowed them to make their own decision about where to get the data from.
> This
>  1. pushed the policy of handling failure down to the clients, less need to
> talk to the NN about it.
>  2. lets you do something very fancy where you deliberately choose data
> from different DNs, so that you can then pull data off the cluster at the
> full bandwidth of every disk
>
> Long term, I would like to see Russ's addition go in, so worry if the
> HDFS-630 patch would be useful long term. Maybe its a more fundamental
> issue: where does the decision making go, into the clients or into the NN?
>
> -steve
>
>
>
> [1] http://www.hpl.hp.com/techreports/2009/HPL-2009-345.html
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message