hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tsz Wo (Nicholas), SZE (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-3157) Error in deleting block is keep on coming from DN even after the block report and directory scanning has happened
Date Mon, 18 Jun 2012 21:27:43 GMT

    [ https://issues.apache.org/jira/browse/HDFS-3157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13396273#comment-13396273
] 

Tsz Wo (Nicholas), SZE commented on HDFS-3157:
----------------------------------------------

See if I understand the patch correctly:
# The new BlockInfo(storedBlock) constructor won't copy triplets.  So the blockInfo in BlockToMarkCorrupt
has the GS in DN but don't have locations.
# In markBlockAsCorrupt(..), since the location could be empty and the GS could be different
from the one in the blocksMap, we lookup the block again.

If my understanding is correct, I have the following suggestions:
- Add storedBlock to BlockToMarkCorrupt so that no additional lookup is required.
- We have to be very careful about when to use the block with the stored gs and when to use
the block with the reported gs.  In markBlockAsCorrupt(..), calls to addToCorruptReplicasMap
and addToInvalidates should pass the block with DN's gs.  Other calls (addBlock, countNodes,
updateNeededReplications) should pass the block with stored gs.  Similar changes has to be
done in invalidateBlock(..).

It is lengthy to describe all the changes.  So I put them in h3157_20120618.patch.  Ashish,
could you combine it with your test if you think they are good?

-----

I think there are similar bugs in processMisReplicatedBlock(..) and the related code since
they do not handle the case that the generation stamps are different.  These are the new code
introduced for HA.  Let's fix it separately.

                
> Error in deleting block is keep on coming from DN even after the block report and directory
scanning has happened
> -----------------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-3157
>                 URL: https://issues.apache.org/jira/browse/HDFS-3157
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: name-node
>    Affects Versions: 0.23.0, 0.24.0
>            Reporter: J.Andreina
>            Assignee: Ashish Singhi
>         Attachments: HDFS-3157-1.patch, HDFS-3157-1.patch, HDFS-3157-2.patch, HDFS-3157-3.patch,
HDFS-3157-3.patch, HDFS-3157-4.patch, HDFS-3157.patch, HDFS-3157.patch, HDFS-3157.patch
>
>
> Cluster setup:
> 1NN,Three DN(DN1,DN2,DN3),replication factor-2,"dfs.blockreport.intervalMsec" 300,"dfs.datanode.directoryscan.interval"
1
> step 1: write one file "a.txt" with sync(not closed)
> step 2: Delete the blocks in one of the datanode say DN1(from rbw) to which replication
happened.
> step 3: close the file.
> Since the replication factor is 2 the blocks are replicated to the other datanode.
> Then at the NN side the following cmd is issued to DN from which the block is deleted
> -------------------------------------------------------------------------------------
> {noformat}
> 2012-03-19 13:41:36,905 INFO org.apache.hadoop.hdfs.StateChange: BLOCK NameSystem.addToCorruptReplicasMap:
duplicate requested for blk_2903555284838653156 to add as corrupt on XX.XX.XX.XX by /XX.XX.XX.XX
because reported RBW replica with genstamp 1002 does not match COMPLETE block's genstamp in
block map 1003
> 2012-03-19 13:41:39,588 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* Removing block
blk_2903555284838653156_1003 from neededReplications as it has enough replicas.
> {noformat}
> From the datanode side in which the block is deleted the following exception occured
> {noformat}
> 2012-02-29 13:54:13,126 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Unexpected
error trying to delete block blk_2903555284838653156_1003. BlockInfo not found in volumeMap.
> 2012-02-29 13:54:13,126 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Error processing
datanode Command
> java.io.IOException: Error in deleting blocks.
> 	at org.apache.hadoop.hdfs.server.datanode.FSDataset.invalidate(FSDataset.java:2061)
> 	at org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActive(BPOfferService.java:581)
> 	at org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActor(BPOfferService.java:545)
> 	at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.processCommand(BPServiceActor.java:690)
> 	at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:522)
> 	at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:662)
> 	at java.lang.Thread.run(Thread.java:619)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message