hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Konstantin Shvachko (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-4810) Data lost at cluster startup time
Date Fri, 12 Dec 2008 08:24:44 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-4810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12655918#action_12655918
] 

Konstantin Shvachko commented on HADOOP-4810:
---------------------------------------------

I think that just removing the safe mode condition from invalidateBlock() (point 3) should
solve this particular problem.
During startup (in safe mode) the block with smaller length will be added to recentInvalidates,
and will not be scheduled for deletion until after the safe mode is turned off - this is how
ReplicationMonitor works now.
In general I agree with Hairong that shorter blocks should be considered corrupt, but doing
it now and synchronizing it in 3 versions seams too complicated. This adds one more step before
incorrect block gets into recentInvalidates. Namely you first place them into corruptReplicas,
and when the blocks are fully replicated they will move into recentInvalidates. Verifying
that all works correctly on that path is hard. I just tried. And I would rather incorporate
this into HADOOP-4563.

> Data lost at cluster startup time
> ---------------------------------
>
>                 Key: HADOOP-4810
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4810
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.18.2
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>            Priority: Blocker
>             Fix For: 0.18.3
>
>         Attachments: corruptBlocksStartup.patch
>
>
> hadoop dfs -cat file1 returns
> dfs.DFSClient: Could not obtain block blk_XX_0 from any node: java.io.IOException: No
live nodes contain current block
> Tracing the history of the block from NN log, we found
>  WARN org.apache.hadoop.fs.FSNamesystem: Inconsistent size for block blk_-6160940519231606858_0
reported from A1.A2.A3.A4:50010 current size is 9303872 reported size is 262144
>  WARN org.apache.hadoop.fs.FSNamesystem: Deleting block blk_-6160940519231606858_0 from
A1.A2.A3.A4:50010
> INFO org.apache.hadoop.dfs.StateChange: DIR* NameSystem.invalidateBlock: blk_-6160940519231606858_0
on A1.A2.A3.A4:50010 
> WARN org.apache.hadoop.fs.FSNamesystem: Error in deleting bad block blk_-6160940519231606858_0
org.apache.hadoop.dfs.SafeModeException: Cannot invalidate block blk_-6160940519231606858_0.
Name node is in safe mode. 
> WARN org.apache.hadoop.fs.FSNamesystem: Inconsistent size for block blk_-6160940519231606858_0
reported from B1.B2.B3.B4:50010 current size is 9303872 reported size is 306688 
> WARN org.apache.hadoop.fs.FSNamesystem: Deleting block blk_-6160940519231606858_0 from
B1.B2.B3.B4:50010 
> INFO org.apache.hadoop.dfs.StateChange: DIR* NameSystem.invalidateBlock: blk_-6160940519231606858_0
on B1.B2.B3.B4:50010 
> WARN org.apache.hadoop.fs.FSNamesystem: Error in deleting bad block blk_-6160940519231606858_0
org.apache.hadoop.dfs.SafeModeException: Cannot invalidate block blk_-6160940519231606858_0.
Name node is in safe mode. 
> INFO org.apache.hadoop.dfs.StateChange: BLOCK* NameSystem.chooseExcessReplicates: (C1.C2.C3.C4:50010,
blk_-6160940519231606858_0) is added to recentInvalidateSets 
> INFO org.apache.hadoop.dfs.StateChange: BLOCK* NameSystem.chooseExcessReplicates: (D1.D2.D3.D4:50010,
blk_-6160940519231606858_0) is added to recentInvalidateSets
> INFO org.apache.hadoop.dfs.StateChange: BLOCK* ask C1.C2.C3.C4:50010 to delete blk_-6160940519231606858_0
> INFO org.apache.hadoop.dfs.StateChange: BLOCK* ask D1.D2.D3.D4:50010 to delete blk_-6160940519231606858_0

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message