hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aaron T. Myers (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HDFS-5027) On startup, DN should scan volumes in parallel
Date Wed, 24 Jul 2013 00:26:48 GMT

     [ https://issues.apache.org/jira/browse/HDFS-5027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Aaron T. Myers updated HDFS-5027:
---------------------------------

    Attachment: HDFS-5027.patch

Little patch which moves the directory scanning from serial to parallel. No tests are included
since this is an optimization and the correctness of this should be covered by other tests.

To ensure it improved things, I ran this manually on a DN with ~1.2 million blocks. The output
below should be pretty telling:

Serially:
{noformat}
13/07/23 15:30:23 INFO impl.FsDatasetImpl: Adding block pool BP-1553953014-172.29.122.91-1336759982696
13/07/23 15:30:23 INFO impl.FsDatasetImpl: ====> adding blockpool for volume /data/1/atm/data/dfs/data-dir-many-blocks/current
13/07/23 15:30:32 INFO impl.FsDatasetImpl: ====> time taken: 8338
13/07/23 15:30:32 INFO impl.FsDatasetImpl: ====> adding blockpool for volume /data/2/atm/data/dfs/data-dir-many-blocks/current
13/07/23 15:30:40 INFO impl.FsDatasetImpl: ====> time taken: 8093
13/07/23 15:30:40 INFO impl.FsDatasetImpl: ====> adding blockpool for volume /data/3/atm/data/dfs/data-dir-many-blocks/current
13/07/23 15:30:48 INFO impl.FsDatasetImpl: ====> time taken: 7621
13/07/23 15:30:48 INFO impl.FsDatasetImpl: ====> adding blockpool for volume /data/4/atm/data/dfs/data-dir-many-blocks/current
13/07/23 15:30:55 INFO impl.FsDatasetImpl: ====> time taken: 7661
13/07/23 15:30:55 INFO impl.FsDatasetImpl: ====> totalTimeTaken: 31714
{noformat}

Parallel:

{noformat}
13/07/23 15:33:01 INFO impl.FsDatasetImpl: Adding block pool BP-1553953014-172.29.122.91-1336759982696
13/07/23 15:33:01 INFO impl.FsDatasetImpl: ====> adding blockpool for volume /data/1/atm/data/dfs/data-dir-many-blocks/current
13/07/23 15:33:01 INFO impl.FsDatasetImpl: ====> adding blockpool for volume /data/3/atm/data/dfs/data-dir-many-blocks/current
13/07/23 15:33:01 INFO impl.FsDatasetImpl: ====> adding blockpool for volume /data/2/atm/data/dfs/data-dir-many-blocks/current
13/07/23 15:33:01 INFO impl.FsDatasetImpl: ====> adding blockpool for volume /data/4/atm/data/dfs/data-dir-many-blocks/current
13/07/23 15:33:10 INFO impl.FsDatasetImpl: ====> time taken: 9018
13/07/23 15:33:10 INFO impl.FsDatasetImpl: ====> time taken: 9091
13/07/23 15:33:10 INFO impl.FsDatasetImpl: ====> time taken: 9368
13/07/23 15:33:10 INFO impl.FsDatasetImpl: ====> time taken: 9614
13/07/23 15:33:10 INFO impl.FsDatasetImpl: ====> totalTimeTaken: 9616
{noformat}
                
> On startup, DN should scan volumes in parallel
> ----------------------------------------------
>
>                 Key: HDFS-5027
>                 URL: https://issues.apache.org/jira/browse/HDFS-5027
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: datanode
>    Affects Versions: 2.0.4-alpha
>            Reporter: Aaron T. Myers
>            Assignee: Aaron T. Myers
>         Attachments: HDFS-5027.patch
>
>
> On startup the DN must scan all replicas on all configured volumes before the initial
block report to the NN. This is currently done serially, but can be done in parallel to improve
startup time of the DN.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message