hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andy Isaacson (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-3194) DataNode block scanner is running too frequently
Date Tue, 21 Aug 2012 19:39:38 GMT

    [ https://issues.apache.org/jira/browse/HDFS-3194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13438977#comment-13438977
] 

Andy Isaacson commented on HDFS-3194:
-------------------------------------

bq. Latest patch from Amith should address this issue. 

I assume you're referring to HDFS-3194_7.patch .

I've asked for a description before of how it solves the problem, because that was not obvious
from the discussion nor from reading the diff.  I'm disappointed that nobody responded to
that request, so I've gone and read the patch in detail, and I think I can finally explain
the approach 7.patch is using.

The problem 7.patch is solving is: currently when we finish scanning a BP, we unconditionally
rotate the log.  We keep two logs, the previous and the current logs.  We sometimes re-scan
a BP before the scanPeriod completes.  If rescan happens twice within a single scanPeriod,
the logs will rotate away and we will forget which blocks we previously scanned, so we will
scan the first blocks again.

To fix this, 7.patch delays rotating the logs until the log has reached a predetermined size,
rather than rotating when the scan completes.

{code}
+  static final int verficationLogLimit = 5;
{code}
What does this constant do?  It seems to govern the block verification log size, but I don't
understand why we want to keep 5 log entries for every block in blockMap.
{code}
+  private static long BLOCK_SCAN_PERIOD_UNIT = 3600 * 1000;
...
-    this.scanPeriod = hours * 3600 * 1000;
+    this.scanPeriod = hours * BLOCK_SCAN_PERIOD_UNIT;
{code}
I don't think adding a named constant here is an improvement, but if you feel that it helps,
please use a more descriptive name for this constant, like MS_PER_HOUR or something similar.

Uma, Amith -- Have you tested 7.patch with multiple block pools and a full cluster restart?
 I think the changed code will leave a dncp_block_verification_log.prev in multiple BP directories,
and I suspect that the BLockPoolSliceScanner might resume from the wrong place if there are
multiple verification_logs in the data directories.

Per Eli's request, I'm going to close this Jira as resolved by my one-line patch which resolves
the "Block scanner runs too frequently" bug, and open a new Jira to track the "Block scanner
repeatedly rescans blocks" bug which is addressed by HDFS-3194_7.patch.
                
> DataNode block scanner is running too frequently
> ------------------------------------------------
>
>                 Key: HDFS-3194
>                 URL: https://issues.apache.org/jira/browse/HDFS-3194
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: data-node
>    Affects Versions: 2.0.0-alpha
>            Reporter: suja s
>            Assignee: Andy Isaacson
>             Fix For: 2.2.0-alpha
>
>         Attachments: HDFS-3194_1.patch, hdfs-3194-1.txt, HDFS-3194_2.patch, HDFS-3194_4.patch,
HDFS-3194_6.patch, HDFS-3194_7.patch, HDFS-3194.patch
>
>
> Block scanning interval by default should be taken as 21 days(3 weeks) and each block
scanning should happen once in 21 days.
> Here the block is being scanned continuosly.
> 2012-04-03 10:44:47,056 INFO org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner:
Verification succeeded for BP-241703115-xx.xx.xx.55-1333086229434:blk_-2666054955039014473_1003
> 2012-04-03 10:45:02,064 INFO org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner:
Verification succeeded for BP-241703115-xx.xx.xx.55-1333086229434:blk_-2666054955039014473_1003
> 2012-04-03 10:45:17,071 INFO org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner:
Verification succeeded for BP-241703115-xx.xx.xx.55-1333086229434:blk_-2666054955039014473_1003
> 2012-04-03 10:45:32,079 INFO org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner:
Verification succeeded for BP-241703115-xx.xx.xx.55-1333086229434:blk_-2666054955039014473

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message