Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: hdfs-issues@hadoop.apache.org
Message-ID: <770887429.196901268285247450.JavaMail.jira@brutus.apache.org>
Date: Thu, 11 Mar 2010 05:27:27 +0000 (UTC)
From: "Dmytro Molkov (JIRA)" <jira@apache.org>
To: hdfs-issues@hadoop.apache.org
Subject: [jira] Updated: (HDFS-854) Datanode should scan devices in parallel
 to generate block report
In-Reply-To: <1826844919.1261898129464.JavaMail.jira@brutus.apache.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


     [ https://issues.apache.org/jira/browse/HDFS-854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dmytro Molkov updated HDFS-854:
-------------------------------

    Attachment: HDFS-854.patch

Please have a look at the patch.

The problem we are trying to solve here is generating the first block report quicker after restart by scanning the volumes in parallel. This way instead of scanning 12 TB of data sequentially we scan 12 chunks of 1 TB in parallel. Since there is a lot of latency in IO we have an improvement of a few times in the time to generate the block report.

The test for this is just running the directory scanner test twice: with parallel execution and without it.

> Datanode should scan devices in parallel to generate block report
> -----------------------------------------------------------------
>
>                 Key: HDFS-854
>                 URL: https://issues.apache.org/jira/browse/HDFS-854
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: data-node
>            Reporter: dhruba borthakur
>            Assignee: Dmytro Molkov
>         Attachments: HDFS-854.patch
>
>
> A Datanode should scan its disk devices in parallel so that the time to generate a block report is reduced. This will reduce the startup time of a cluster.
> A datanode has 12 disk (each of 1 TB) to store HDFS blocks. There is a total of 150K blocks on these 12 disks. It takes the datanode upto 20 minutes to scan these devices to generate the first block report.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.