hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Allen Wittenauer (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-2004) Enable replicating and pinning files to a data node
Date Mon, 06 Jun 2011 16:40:59 GMT

    [ https://issues.apache.org/jira/browse/HDFS-2004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13044943#comment-13044943
] 

Allen Wittenauer commented on HDFS-2004:
----------------------------------------

bq. What if this theoretical facility only allowed clients to "request" that a block move
to some DN, with the NN being able to then make the final call?

I'd say I'm not interested and one should probably use a different file system that meets
the needs.  I'd also point out that HDFS provides an interface where one can discover where
the block is located and that the scheduling algorithm from the client should take that information
into consideration.

One of the big selling points for Hadoop is that the code gets moved to the data.  This proposed
API is the equivalent of saying "No, the data should actually get moved."

bq. This isn't about "bypassing HDFS" - it's about making the interface to HDFS more capable/performant
for a specific type of client.

The cited example definitely is.  Go read the HBASE case.  mmap() is mentioned several times.
 If that isn't bypassing HDFS, I don't know what is.  The HBASE case will basically lead to
broken clients if/when the on-disk block format changes.  For example, what happens if someone
adds on-disk encryption?

I posit that the *only* reason a client would request to move a block is if it is doing something
it shouldn't be doing.  Yes, I understand the "for long running clients this should be a perf
gain".  I'd argue that long running clients should be doing a memory cache or use a different
file system rather than hammer HDFS continually for the same blocks.

bq. . I just think that we should allow contributors to post a patch which scratches their
itch, and then evaluate the implementation, not the idea, of it.

You are certainly entitled to your opinion.  I'm also entitled to mine.  -1 it is.


> Enable replicating and pinning files to a data node
> ---------------------------------------------------
>
>                 Key: HDFS-2004
>                 URL: https://issues.apache.org/jira/browse/HDFS-2004
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: balancer
>    Affects Versions: 0.23.0
>            Reporter: Jason Rutherglen
>
> Some HDFS applications require that a given file is on the local DataNode.  The functionality
created here will allow pinning the file to any DataNode.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message