hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Colin Patrick McCabe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes
Date Fri, 18 Jul 2014 22:20:18 GMT

    [ https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14066988#comment-14066988
] 

Colin Patrick McCabe commented on HDFS-6482:
--------------------------------------------

1 second for 100k blocks is pretty good.

bq. Added a configuration parameter for users to specify the number of threads to be used
in the hard link process.

Perhaps one thread per storage directory would make sense?  I'm not sure if a configuration
option is useful, if this upgrade is a one time event (and the NameNodes that would be upgraded
have already been deployed.)

bq. We use these optimizations for the hard link process only when upgrading to the block
ID-based layout, because otherwise the directory structures of the old and new layouts should
be the same and we can perform fast batch hard links over directories – see HDFS-1445.

Why not always use the native path, if it's faster?  It should be trivial to implement the
"batch" symlink API via the native path.  You'd just write a "for" loop in java that made
some calls down into the JNI function you already wrote.  There is a new symlink API coming
up in Java7, so we'll want to stop using the shell thing eventually anyway.

> Use block ID-based block layout on datanodes
> --------------------------------------------
>
>                 Key: HDFS-6482
>                 URL: https://issues.apache.org/jira/browse/HDFS-6482
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: datanode
>    Affects Versions: 2.5.0
>            Reporter: James Thomas
>            Assignee: James Thomas
>         Attachments: 6482-design.doc, HDFS-6482.1.patch, HDFS-6482.2.patch, HDFS-6482.3.patch,
HDFS-6482.4.patch, HDFS-6482.5.patch, HDFS-6482.6.patch, HDFS-6482.7.patch, HDFS-6482.8.patch,
HDFS-6482.patch
>
>
> Right now blocks are placed into directories that are split into many subdirectories
when capacity is reached. Instead we can use a block's ID to determine the path it should
go in. This eliminates the need for the LDir data structure that facilitates the splitting
of directories when they reach capacity as well as fields in ReplicaInfo that keep track of
a replica's location.
> An extension of the work in HDFS-3290.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message