Return-Path: X-Original-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A94EE1729B for ; Wed, 21 Jan 2015 23:05:36 +0000 (UTC) Received: (qmail 43854 invoked by uid 500); 21 Jan 2015 23:05:36 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 43800 invoked by uid 500); 21 Jan 2015 23:05:36 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 43788 invoked by uid 99); 21 Jan 2015 23:05:36 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 21 Jan 2015 23:05:36 +0000 Date: Wed, 21 Jan 2015 23:05:36 +0000 (UTC) From: "Colin Patrick McCabe (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HDFS-7430) Refactor the BlockScanner to use O(1) memory and use multiple threads MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HDFS-7430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14286495#comment-14286495 ] Colin Patrick McCabe commented on HDFS-7430: -------------------------------------------- It is fair to call this a rewrite of major parts of the block scanner. I don't think it makes sense to maintain two block scanners in parallel. There would have to be a lot of glue code and extra interfaces to get both working. Let's let this soak in trunk for a while and then merge to branch-2 when it is stabilized, the same as we did with other things such as truncate. > Refactor the BlockScanner to use O(1) memory and use multiple threads > --------------------------------------------------------------------- > > Key: HDFS-7430 > URL: https://issues.apache.org/jira/browse/HDFS-7430 > Project: Hadoop HDFS > Issue Type: Improvement > Affects Versions: 2.7.0 > Reporter: Colin Patrick McCabe > Assignee: Colin Patrick McCabe > Attachments: HDFS-7430.002.patch, HDFS-7430.003.patch, HDFS-7430.004.patch, HDFS-7430.005.patch, HDFS-7430.006.patch, HDFS-7430.007.patch, HDFS-7430.008.patch, HDFS-7430.009.patch, HDFS-7430.010.patch, HDFS-7430.011.patch, HDFS-7430.012.patch, memory.png > > > We should update the BlockScanner to use a constant amount of memory by keeping track of what block was scanned last, rather than by tracking the scan status of all blocks in memory. Also, instead of having just one thread, we should have a verification thread per hard disk (or other volume), scanning at a configurable rate of bytes per second. -- This message was sent by Atlassian JIRA (v6.3.4#6332)