Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: hdfs-issues@hadoop.apache.org
Date: Wed, 18 Sep 2013 00:01:52 +0000 (UTC)
From: "Arpit Agarwal (JIRA)" <jira@apache.org>
To: hdfs-issues@hadoop.apache.org
Message-ID: <JIRA.12659793.1374755647225.150929.1379462512449@arcas>
In-Reply-To: <JIRA.12659793.1374755647225@arcas>
References: <JIRA.12659793.1374755647225@arcas>
Subject: [jira] [Commented] (HDFS-5031) BlockScanner scans the block
 multiple times and on restart scans everything
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/HDFS-5031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13770189#comment-13770189 ] 

Arpit Agarwal commented on HDFS-5031:
-------------------------------------

Hi Vinay, thanks for the updated patch. I verified that the new test case fails without your code changes.

The patch looks good except for one point. I am still not convinced that the assignment to {{lastReadFile}} before the call to {{readNext}} is correct. Is {{lastReadFile}} meant to store the file from which the last line was read? If so then the call to {{readNext}} can change {{file}}, or did I understand it wrong?

{code}
    private void readNext() throws IOException {
...
        if (line == null) {
          // move to the next file.
          if (openFile()) {
            readNext();
          }
{code}

{quote}
processedBlocks is getting reset for every log roll, but bytesLeft is getting reset only for every startNewPeriod(), so on every log roll unnecessory bytesLeft was getting decremented in assignInitialVerificationTimes() which was resulting in negative values of bytesLeft. Due to this scanning was returning from workRemainingInCurrentPeriod() without scanning latest blocks. We should decrement it only once after starting the new period.
{quote}

Thanks for the explanation, I understand what you are trying to fix now.

                
> BlockScanner scans the block multiple times and on restart scans everything
> ---------------------------------------------------------------------------
>
>                 Key: HDFS-5031
>                 URL: https://issues.apache.org/jira/browse/HDFS-5031
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: datanode
>    Affects Versions: 3.0.0, 2.1.0-beta
>            Reporter: Vinay
>            Assignee: Vinay
>         Attachments: HDFS-5031.patch, HDFS-5031.patch, HDFS-5031.patch
>
>
> BlockScanner scans the block twice, also on restart of datanode scans everything.
> Steps:
> 1. Write blocks with interval of more than 5 seconds. write new block on completion of scan for written block.
> Each time datanode scans new block, it also scans, previous block which is already scanned. 
> Now after restart, datanode scans all blocks again.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira