hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vinay (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-5031) BlockScanner scans the block multiple times and on restart scans everything
Date Wed, 11 Sep 2013 07:02:53 GMT

    [ https://issues.apache.org/jira/browse/HDFS-5031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13764063#comment-13764063
] 

Vinay commented on HDFS-5031:
-----------------------------

bq. I ran TestDatanodeBlockScanner#testDuplicateScans without the rest of the code changes
and it continues to pass. Do you see the same?
Yes. I also observed yesterday. I had missed one assertion. Will be updated in upcoming patch
bq. I did not understand how the isNewPeriod check works. I will continue to take a look but
meanwhile if someone more familiar with this code wants to chime in please do so.
{{processedBlocks}} is getting reset for every log roll, but {{bytesLeft}} is getting reset
only for every {{startNewPeriod()}}, so on every log roll unnecessory {{bytesLeft}} was getting
decremented in {{assignInitialVerificationTimes()}} which was resulting in negative values
of bytesLeft. Due to this scanning was returning from {{workRemainingInCurrentPeriod()}} without
scanning latest blocks. We should decrement it only once after starting the new period.

bq. BlockScanInfo#equals looks redundant now. Can we just remove it?
Yes, I will remove in next patch.

bq. In Reader#next, should the assignment to lastReadFile happen after the call to readNext?

Since {{Reader#next}} is not actually reading again and returning. Its returning previously
read line only. So assignment of {{lastReadFile }} before {{readNext}} is correct.
                
> BlockScanner scans the block multiple times and on restart scans everything
> ---------------------------------------------------------------------------
>
>                 Key: HDFS-5031
>                 URL: https://issues.apache.org/jira/browse/HDFS-5031
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: datanode
>    Affects Versions: 3.0.0, 2.1.0-beta
>            Reporter: Vinay
>            Assignee: Vinay
>         Attachments: HDFS-5031.patch, HDFS-5031.patch
>
>
> BlockScanner scans the block twice, also on restart of datanode scans everything.
> Steps:
> 1. Write blocks with interval of more than 5 seconds. write new block on completion of
scan for written block.
> Each time datanode scans new block, it also scans, previous block which is already scanned.

> Now after restart, datanode scans all blocks again.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message