hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vinay (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-5031) BlockScanner scans the block multiple times and on restart scans everything
Date Wed, 11 Sep 2013 07:02:53 GMT

    [ https://issues.apache.org/jira/browse/HDFS-5031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13764063#comment-13764063

Vinay commented on HDFS-5031:

bq. I ran TestDatanodeBlockScanner#testDuplicateScans without the rest of the code changes
and it continues to pass. Do you see the same?
Yes. I also observed yesterday. I had missed one assertion. Will be updated in upcoming patch
bq. I did not understand how the isNewPeriod check works. I will continue to take a look but
meanwhile if someone more familiar with this code wants to chime in please do so.
{{processedBlocks}} is getting reset for every log roll, but {{bytesLeft}} is getting reset
only for every {{startNewPeriod()}}, so on every log roll unnecessory {{bytesLeft}} was getting
decremented in {{assignInitialVerificationTimes()}} which was resulting in negative values
of bytesLeft. Due to this scanning was returning from {{workRemainingInCurrentPeriod()}} without
scanning latest blocks. We should decrement it only once after starting the new period.

bq. BlockScanInfo#equals looks redundant now. Can we just remove it?
Yes, I will remove in next patch.

bq. In Reader#next, should the assignment to lastReadFile happen after the call to readNext?

Since {{Reader#next}} is not actually reading again and returning. Its returning previously
read line only. So assignment of {{lastReadFile }} before {{readNext}} is correct.
> BlockScanner scans the block multiple times and on restart scans everything
> ---------------------------------------------------------------------------
>                 Key: HDFS-5031
>                 URL: https://issues.apache.org/jira/browse/HDFS-5031
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: datanode
>    Affects Versions: 3.0.0, 2.1.0-beta
>            Reporter: Vinay
>            Assignee: Vinay
>         Attachments: HDFS-5031.patch, HDFS-5031.patch
> BlockScanner scans the block twice, also on restart of datanode scans everything.
> Steps:
> 1. Write blocks with interval of more than 5 seconds. write new block on completion of
scan for written block.
> Each time datanode scans new block, it also scans, previous block which is already scanned.

> Now after restart, datanode scans all blocks again.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message