hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andy Isaacson (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-3828) Block Scanner rescans blocks too frequently
Date Thu, 23 Aug 2012 00:27:42 GMT

    [ https://issues.apache.org/jira/browse/HDFS-3828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13439972#comment-13439972
] 

Andy Isaacson commented on HDFS-3828:
-------------------------------------

bq. If the scanner scans exactly once shouldn't scansLastRun be 0 after this first run? Ie
getBlocksScannedInLastRun shouldn't always return 1 right?

Empirically it is always 1 after a block has been scanned.  This is because when we call scanBlockPoolSlice
but there is nothing to scan we're doing a bunch of useless work:
# creating a new HashMap {{processedBlocks}}
# parsing the verificationLogs and putting the results in the new {{processedBlocks}}
# calling scan() which returns immediately
# setting totalBlocksScannedInLastRun to the resulting size of {{processedBlocks}}

bq. Like the new approach better.

I also like the new code better, but the fact that we can't shortcircuit all the nonsense
enumerated above in {{scanBlockPoolSlice}} is a bummer.  The previous approach avoided doing
all of this extra work.

As an alternative, we could propagate a "please wake me up at time T" up from BlockPoolSliceScanner
to DataBlockScanner#run and adjust the sleep time there, accordingly.  If all threadpools
continue to have work to do, then preserve the existing 5-second sleep; if all threadpools
are done working then DataBlockScanner could go to sleep for much longer.
                
> Block Scanner rescans blocks too frequently
> -------------------------------------------
>
>                 Key: HDFS-3828
>                 URL: https://issues.apache.org/jira/browse/HDFS-3828
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 0.23.0, 2.0.0-alpha
>            Reporter: Andy Isaacson
>            Assignee: Andy Isaacson
>         Attachments: hdfs-3828-1.txt, hdfs3828.txt
>
>
> {{BlockPoolSliceScanner#scan}} calls cleanUp every time it's invoked from {{DataBlockScanner#run}}
via {{scanBlockPoolSlice}}.  But cleanUp unconditionally roll()s the verificationLogs, so
after two iterations we have lost the first iteration of block verification times.  As a result
a cluster with just one block repeatedly rescans it every 10 seconds:
> {noformat}
> 2012-08-16 15:59:57,884 INFO  datanode.BlockPoolSliceScanner (BlockPoolSliceScanner.java:verifyBlock(391))
- Verification succeeded for BP-2101131164-172.29.122.91-1337906886255:blk_7919273167187535506_4915
> 2012-08-16 16:00:07,904 INFO  datanode.BlockPoolSliceScanner (BlockPoolSliceScanner.java:verifyBlock(391))
- Verification succeeded for BP-2101131164-172.29.122.91-1337906886255:blk_7919273167187535506_4915
> 2012-08-16 16:00:17,925 INFO  datanode.BlockPoolSliceScanner (BlockPoolSliceScanner.java:verifyBlock(391))
- Verification succeeded for BP-2101131164-172.29.122.91-1337906886255:blk_7919273167187535506_4915
> {noformat}
> {quote}
> To fix this, we need to avoid roll()ing the logs multiple times per period.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message