Return-Path: X-Original-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id E55E710D33 for ; Wed, 18 Sep 2013 00:01:52 +0000 (UTC) Received: (qmail 49822 invoked by uid 500); 18 Sep 2013 00:01:52 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 49774 invoked by uid 500); 18 Sep 2013 00:01:52 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 49725 invoked by uid 99); 18 Sep 2013 00:01:52 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 18 Sep 2013 00:01:52 +0000 Date: Wed, 18 Sep 2013 00:01:52 +0000 (UTC) From: "Arpit Agarwal (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HDFS-5031) BlockScanner scans the block multiple times and on restart scans everything MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HDFS-5031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13770189#comment-13770189 ] Arpit Agarwal commented on HDFS-5031: ------------------------------------- Hi Vinay, thanks for the updated patch. I verified that the new test case fails without your code changes. The patch looks good except for one point. I am still not convinced that the assignment to {{lastReadFile}} before the call to {{readNext}} is correct. Is {{lastReadFile}} meant to store the file from which the last line was read? If so then the call to {{readNext}} can change {{file}}, or did I understand it wrong? {code} private void readNext() throws IOException { ... if (line == null) { // move to the next file. if (openFile()) { readNext(); } {code} {quote} processedBlocks is getting reset for every log roll, but bytesLeft is getting reset only for every startNewPeriod(), so on every log roll unnecessory bytesLeft was getting decremented in assignInitialVerificationTimes() which was resulting in negative values of bytesLeft. Due to this scanning was returning from workRemainingInCurrentPeriod() without scanning latest blocks. We should decrement it only once after starting the new period. {quote} Thanks for the explanation, I understand what you are trying to fix now. > BlockScanner scans the block multiple times and on restart scans everything > --------------------------------------------------------------------------- > > Key: HDFS-5031 > URL: https://issues.apache.org/jira/browse/HDFS-5031 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode > Affects Versions: 3.0.0, 2.1.0-beta > Reporter: Vinay > Assignee: Vinay > Attachments: HDFS-5031.patch, HDFS-5031.patch, HDFS-5031.patch > > > BlockScanner scans the block twice, also on restart of datanode scans everything. > Steps: > 1. Write blocks with interval of more than 5 seconds. write new block on completion of scan for written block. > Each time datanode scans new block, it also scans, previous block which is already scanned. > Now after restart, datanode scans all blocks again. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira