hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Arpit Agarwal (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-5031) BlockScanner scans the block multiple times and on restart scans everything
Date Wed, 18 Sep 2013 00:01:52 GMT

    [ https://issues.apache.org/jira/browse/HDFS-5031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13770189#comment-13770189
] 

Arpit Agarwal commented on HDFS-5031:
-------------------------------------

Hi Vinay, thanks for the updated patch. I verified that the new test case fails without your
code changes.

The patch looks good except for one point. I am still not convinced that the assignment to
{{lastReadFile}} before the call to {{readNext}} is correct. Is {{lastReadFile}} meant to
store the file from which the last line was read? If so then the call to {{readNext}} can
change {{file}}, or did I understand it wrong?

{code}
    private void readNext() throws IOException {
...
        if (line == null) {
          // move to the next file.
          if (openFile()) {
            readNext();
          }
{code}

{quote}
processedBlocks is getting reset for every log roll, but bytesLeft is getting reset only for
every startNewPeriod(), so on every log roll unnecessory bytesLeft was getting decremented
in assignInitialVerificationTimes() which was resulting in negative values of bytesLeft. Due
to this scanning was returning from workRemainingInCurrentPeriod() without scanning latest
blocks. We should decrement it only once after starting the new period.
{quote}

Thanks for the explanation, I understand what you are trying to fix now.

                
> BlockScanner scans the block multiple times and on restart scans everything
> ---------------------------------------------------------------------------
>
>                 Key: HDFS-5031
>                 URL: https://issues.apache.org/jira/browse/HDFS-5031
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: datanode
>    Affects Versions: 3.0.0, 2.1.0-beta
>            Reporter: Vinay
>            Assignee: Vinay
>         Attachments: HDFS-5031.patch, HDFS-5031.patch, HDFS-5031.patch
>
>
> BlockScanner scans the block twice, also on restart of datanode scans everything.
> Steps:
> 1. Write blocks with interval of more than 5 seconds. write new block on completion of
scan for written block.
> Each time datanode scans new block, it also scans, previous block which is already scanned.

> Now after restart, datanode scans all blocks again.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message