Return-Path: Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: (qmail 28443 invoked from network); 11 Mar 2010 05:28:21 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 11 Mar 2010 05:28:21 -0000 Received: (qmail 32613 invoked by uid 500); 11 Mar 2010 05:27:49 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 32453 invoked by uid 500); 11 Mar 2010 05:27:49 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 32434 invoked by uid 99); 11 Mar 2010 05:27:48 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 11 Mar 2010 05:27:48 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 11 Mar 2010 05:27:47 +0000 Received: from brutus.apache.org (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 6E42929A0015 for ; Thu, 11 Mar 2010 05:27:27 +0000 (UTC) Message-ID: <770887429.196901268285247450.JavaMail.jira@brutus.apache.org> Date: Thu, 11 Mar 2010 05:27:27 +0000 (UTC) From: "Dmytro Molkov (JIRA)" To: hdfs-issues@hadoop.apache.org Subject: [jira] Updated: (HDFS-854) Datanode should scan devices in parallel to generate block report In-Reply-To: <1826844919.1261898129464.JavaMail.jira@brutus.apache.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HDFS-854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmytro Molkov updated HDFS-854: ------------------------------- Attachment: HDFS-854.patch Please have a look at the patch. The problem we are trying to solve here is generating the first block report quicker after restart by scanning the volumes in parallel. This way instead of scanning 12 TB of data sequentially we scan 12 chunks of 1 TB in parallel. Since there is a lot of latency in IO we have an improvement of a few times in the time to generate the block report. The test for this is just running the directory scanner test twice: with parallel execution and without it. > Datanode should scan devices in parallel to generate block report > ----------------------------------------------------------------- > > Key: HDFS-854 > URL: https://issues.apache.org/jira/browse/HDFS-854 > Project: Hadoop HDFS > Issue Type: Improvement > Components: data-node > Reporter: dhruba borthakur > Assignee: Dmytro Molkov > Attachments: HDFS-854.patch > > > A Datanode should scan its disk devices in parallel so that the time to generate a block report is reduced. This will reduce the startup time of a cluster. > A datanode has 12 disk (each of 1 TB) to store HDFS blocks. There is a total of 150K blocks on these 12 disks. It takes the datanode upto 20 minutes to scan these devices to generate the first block report. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.