hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ashish Singhi (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-3157) Error in deleting block is keep on coming from DN even after the block report and directory scanning has happened
Date Wed, 30 May 2012 03:53:23 GMT

    [ https://issues.apache.org/jira/browse/HDFS-3157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13285368#comment-13285368
] 

Ashish Singhi commented on HDFS-3157:
-------------------------------------

I forgot to mention that, I have used 
{code}+      this.triplets = from.triplets;{code}

instead of 
{code}+      System.arraycopy(from.triplets, 0, this.triplets, 0, from.triplets.length);{code}

In System.arraycopy it will create a new reference. So problem is in markBlockAsCorrupt(...)
at node.addBlock(storedBlock), we will add the datanode into the triplets of corruptBlock
but when we call countNodes(...) here when we look in blockMap for the storedBlock it will
return the iterator of only one datanode i.e., the one holding the live replica. 
To avoid this I have used this.triplets = from.triplets, so that both will pointing to the
same location and there will not be any problem as described above.
                
> Error in deleting block is keep on coming from DN even after the block report and directory
scanning has happened
> -----------------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-3157
>                 URL: https://issues.apache.org/jira/browse/HDFS-3157
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: name-node
>    Affects Versions: 0.23.0, 0.24.0
>            Reporter: J.Andreina
>            Assignee: Ashish Singhi
>         Attachments: HDFS-3157-1.patch, HDFS-3157-1.patch, HDFS-3157-2.patch, HDFS-3157.patch,
HDFS-3157.patch, HDFS-3157.patch
>
>
> Cluster setup:
> 1NN,Three DN(DN1,DN2,DN3),replication factor-2,"dfs.blockreport.intervalMsec" 300,"dfs.datanode.directoryscan.interval"
1
> step 1: write one file "a.txt" with sync(not closed)
> step 2: Delete the blocks in one of the datanode say DN1(from rbw) to which replication
happened.
> step 3: close the file.
> Since the replication factor is 2 the blocks are replicated to the other datanode.
> Then at the NN side the following cmd is issued to DN from which the block is deleted
> -------------------------------------------------------------------------------------
> {noformat}
> 2012-03-19 13:41:36,905 INFO org.apache.hadoop.hdfs.StateChange: BLOCK NameSystem.addToCorruptReplicasMap:
duplicate requested for blk_2903555284838653156 to add as corrupt on XX.XX.XX.XX by /XX.XX.XX.XX
because reported RBW replica with genstamp 1002 does not match COMPLETE block's genstamp in
block map 1003
> 2012-03-19 13:41:39,588 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* Removing block
blk_2903555284838653156_1003 from neededReplications as it has enough replicas.
> {noformat}
> From the datanode side in which the block is deleted the following exception occured
> {noformat}
> 2012-02-29 13:54:13,126 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Unexpected
error trying to delete block blk_2903555284838653156_1003. BlockInfo not found in volumeMap.
> 2012-02-29 13:54:13,126 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Error processing
datanode Command
> java.io.IOException: Error in deleting blocks.
> 	at org.apache.hadoop.hdfs.server.datanode.FSDataset.invalidate(FSDataset.java:2061)
> 	at org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActive(BPOfferService.java:581)
> 	at org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActor(BPOfferService.java:545)
> 	at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.processCommand(BPServiceActor.java:690)
> 	at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:522)
> 	at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:662)
> 	at java.lang.Thread.run(Thread.java:619)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message