hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dmytro Molkov (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HDFS-854) Datanode should scan devices in parallel to generate block report
Date Thu, 11 Mar 2010 05:27:27 GMT

     [ https://issues.apache.org/jira/browse/HDFS-854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Dmytro Molkov updated HDFS-854:
-------------------------------

    Attachment: HDFS-854.patch

Please have a look at the patch.

The problem we are trying to solve here is generating the first block report quicker after
restart by scanning the volumes in parallel. This way instead of scanning 12 TB of data sequentially
we scan 12 chunks of 1 TB in parallel. Since there is a lot of latency in IO we have an improvement
of a few times in the time to generate the block report.

The test for this is just running the directory scanner test twice: with parallel execution
and without it.

> Datanode should scan devices in parallel to generate block report
> -----------------------------------------------------------------
>
>                 Key: HDFS-854
>                 URL: https://issues.apache.org/jira/browse/HDFS-854
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: data-node
>            Reporter: dhruba borthakur
>            Assignee: Dmytro Molkov
>         Attachments: HDFS-854.patch
>
>
> A Datanode should scan its disk devices in parallel so that the time to generate a block
report is reduced. This will reduce the startup time of a cluster.
> A datanode has 12 disk (each of 1 TB) to store HDFS blocks. There is a total of 150K
blocks on these 12 disks. It takes the datanode upto 20 minutes to scan these devices to generate
the first block report.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message