hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jing Zhao (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HDFS-3616) TestWebHdfsWithMultipleNameNodes fails with ConcurrentModificationException in DN shutdown
Date Tue, 23 Oct 2012 20:15:12 GMT

     [ https://issues.apache.org/jira/browse/HDFS-3616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jing Zhao updated HDFS-3616:
----------------------------

    Attachment: HDFS-3616.trunk.001.patch

After checking the code, I guess the exception is caused by this process:

1. In DataNode#shutdown(), DataNode#shouldRun is set to false.

2. BPServiceActor#run() stops running, and runs BPServiceActor#cleanUp().

3. While executing BPServiceActor#cleanUp(), DataNode#shutdownBlockPool() is called, where
blockPoolManager.remove(bpos) is executed before "this.blockPoolManager.shutDownAll();" is
called in DataNode#shutdown(). Thus the corresponding BPOfferService cannot be seen and shutdown
by blockPoolManager#shutDownAll() since it has been removed from BlockPoolManager#offerServices.

4. The actor thread continues running DataNode#shutdownBlockPool() which will finally tries
to remove record from FsVolumeImpl#bpSlices, while the DataNode shutdown thread runs into
FsVolumeImpl#shutdown() which iterates the bpSlices. Thus the ConcurrentModificationException
may be thrown.

So to avoid changing other code, maybe we can simply change bpSlices from HashMap to ConcurrentHashMap?
A simple patch based on this is attached.
                
> TestWebHdfsWithMultipleNameNodes fails with ConcurrentModificationException in DN shutdown
> ------------------------------------------------------------------------------------------
>
>                 Key: HDFS-3616
>                 URL: https://issues.apache.org/jira/browse/HDFS-3616
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: data-node
>    Affects Versions: 2.0.0-alpha, 3.0.0
>            Reporter: Uma Maheswara Rao G
>            Assignee: Jing Zhao
>         Attachments: HDFS-3616.trunk.001.patch
>
>
> I have seen this in precommit build #2743
> {noformat}
> java.util.ConcurrentModificationException
> 	at java.util.HashMap$HashIterator.nextEntry(HashMap.java:793)
> 	at java.util.HashMap$EntryIterator.next(HashMap.java:834)
> 	at java.util.HashMap$EntryIterator.next(HashMap.java:832)
> 	at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.shutdown(FsVolumeImpl.java:209)
> 	at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList.shutdown(FsVolumeList.java:168)
> 	at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.shutdown(FsDatasetImpl.java:1214)
> 	at org.apache.hadoop.hdfs.server.datanode.DataNode.shutdown(DataNode.java:1105)
> 	at org.apache.hadoop.hdfs.MiniDFSCluster.shutdownDataNodes(MiniDFSCluster.java:1324)
> 	at org.apache.hadoop.hdfs.MiniDFSCluster.shutdown(MiniDFSCluster.java:1304)
> 	at org.apache.hadoop.hdfs.web.TestWebHdfsWithMultipleNameNodes.shutdownCluster(TestWebHdfsWithMultipleNameNodes.java:100)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message