hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aaron T. Myers (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-2004) Enable replicating and pinning files to a data node
Date Mon, 06 Jun 2011 04:13:48 GMT

    [ https://issues.apache.org/jira/browse/HDFS-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13044700#comment-13044700

Aaron T. Myers commented on HDFS-2004:

bq. I'm vetoing the very concept of a client being able to dictate to the NN how it should
replicate the data. 

"Dictate" is a strong word, and isn't necessarily what's being proposed here. What if this
theoretical facility only allowed clients to "request" that a block move to some DN, with
the NN being able to then make the final call?

I don't think it's reasonable to veto an idea before there's been any proposed design or implementation.

bq. It won't scale up very well without having severe performance consequences to the NN.

That's not necessarily true. It depends upon the implementation, which we haven't seen yet.
As Todd said earlier, "The difficulties in implementation are obvious - eg you don't want
it to fight against a balancer or other placement policies in action on the cluster. But that's
a matter to evaluate after the work is done, if someone is willing to put forth the work."

bq. It should also be pointed out that the HBASE example is essentially bypassing HDFS to
talk directly to the underlying file system via mmap(). We should not encourage such bad behavior.

This isn't about "bypassing HDFS" - it's about making the interface to HDFS more capable/performant
for a specific type of client. HDFS already makes an effort to ensure that clients that are
local to a DN which write a block will have one replica of that block placed on that DN, at
least initially. I don't see how adding an interface to *request* (not *require*) the NN move
a block replica to a specific DN is meaningfully different than that already-existing HDFS
facility. The only distinction is whether the request is done implicitly at file-write time
because the client is collocated with the DN, or explicitly at a later time.

To be clear, I'm not volunteering to do this work, and I'm not blocked because of it. I just
think that we should allow contributors to post a patch which scratches their itch, and then
evaluate the implementation, not the idea, of it. If, after an implementation is proposed/provided,
you still have technical objections, then by all means veto away.

> Enable replicating and pinning files to a data node
> ---------------------------------------------------
>                 Key: HDFS-2004
>                 URL: https://issues.apache.org/jira/browse/HDFS-2004
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: balancer
>    Affects Versions: 0.23.0
>            Reporter: Jason Rutherglen
> Some HDFS applications require that a given file is on the local DataNode.  The functionality
created here will allow pinning the file to any DataNode.

This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message