hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hadoop QA (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-2890) HDFS should recover when replicas of block have different sizes (due to corrupted block)
Date Thu, 13 Mar 2008 08:31:46 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-2890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12578183#action_12578183
] 

Hadoop QA commented on HADOOP-2890:
-----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
http://issues.apache.org/jira/secure/attachment/12377632/inconsistentSize.patch
against trunk revision 619744.

    @author +1.  The patch does not contain any @author tags.

    tests included -1.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no tests are needed for this patch.

    javadoc +1.  The javadoc tool did not generate any warning messages.

    javac +1.  The applied patch does not generate any new javac compiler warnings.

    release audit +1.  The applied patch does not generate any new release audit warnings.

    findbugs -1.  The patch appears to cause Findbugs to fail.

    core tests -1.  The patch failed core unit tests.

    contrib tests -1.  The patch failed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1951/testReport/
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1951/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1951/console

This message is automatically generated.

> HDFS should recover when  replicas of block have different sizes (due to corrupted block)
> -----------------------------------------------------------------------------------------
>
>                 Key: HADOOP-2890
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2890
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.16.0
>            Reporter: lohit vijayarenu
>            Assignee: dhruba borthakur
>             Fix For: 0.17.0
>
>         Attachments: inconsistentSize.patch, inconsistentSize.patch, inconsistentSize.patch
>
>
> We had a case where reading a file caused IOException.
> 08/02/25 17:23:02 INFO fs.DFSClient: Could not obtain block blk_-8333897631311887285
from any node:  java.io.IOException: No live nodes contain current block
> hadoop fsck said the block was healthy.
> [lohit]$ hadoop fsck part-04344 -files -blocks -locations | grep 8333897631311887285
> 21. -8333897631311887285 len=134217728 repl=3 [74.6.129.238:50010, 74.6.133.231:50010,
74.6.128.158:50010]
> Looking for logs about the block showed this message in namenode log
> 17:26:23,543 WARN org.apache.hadoop.fs.FSNamesystem: Inconsistent size for block blk_-8333897631311887285
reported from 74.6.133.231:50010 current size is 134217728 reported size is 134205440
> So, the namenode was expecting 134217728 while the actual block size was 134205440
> Dhruba took a look at the logs further and we found out this is what had happend
> 1. While the file was being created this block was replicated to three nodes of which
2 nodes had correct sized block, but the third node has partial/truncated block. (but the
metadata was same on all nodes)
> 2. Later after 3 days namenode was restarted, at which point the 3rd node reported warning
message about incorrect block size. (Namenode logged this)
> 3. After few days the first 2 nodes went down and the 3rd node replicated the partial/truncated
block to two new nodes. 
> 4. Now when we tried to read this block, we hit the IOException
> 5. On all the nodes, the metadata corresponded to the original valid block while the
block itself was missing around 12K of data.
> Two problems which could be fixed here
> 1. When namenode identifies replicas with different blocksize (point 2 above). It could
choose the biggest block and discard the small block. If the block is not the last block,
then its size has to be equal to the block size, anything less than that could be considered
bad block.
> 2. Datanode Block periodic verifier could also verify that the metadata has the correct
size as that of the actual block present. Any changes should be reported/recovered considering
what would be done in above step.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message