hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yongjun Zhang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-6428) TestWebHdfsWithMultipleNameNodes failed with ConcurrentModificationException
Date Tue, 20 May 2014 15:50:38 GMT

    [ https://issues.apache.org/jira/browse/HDFS-6428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14003497#comment-14003497
] 

Yongjun Zhang commented on HDFS-6428:
-------------------------------------

Thanks Colin for the off-line discussion. He suggested to find out what caused the runtime
increase, and I figured out it's the highlighted line in the following block that caused long
runtime after adding synchronize statement, which is understandable because multi-threads
are synchronizing here. Actually without the synchronization, it would not be safe.

{code}
  void addBlockPool(final String bpid, final Configuration conf) throws IOException {
    ...
    List<Thread> blockPoolAddingThreads = new ArrayList<Thread>();
    for (final FsVolumeImpl v : volumes) {
      Thread t = new Thread() {
        public void run() {
          try {
            ...
            v.addBlockPool(bpid, conf); <=====================if synchronized, caused slow
performance.
            ...
{code}

Why we did not run into problem easily at the above highlighted line? this question made me
realize that bpSlices is {{ConcurrentHashMap}}, which is designed to take care of most of
concurrency issue:
{code}
The allowed concurrency among update operations is guided by the optional concurrencyLevel
constructor argument (default 16), which is used as a hint for internal sizing. The table
is internally partitioned to try to permit the indicated number of concurrent updates without
contention. Because placement in hash tables is essentially random, the actual concurrency
will vary. Ideally, you should choose a value to accommodate as many threads as will ever
concurrently modify the table. 
{code}
So I think adding another level of synchronization for the addBlockPool and some of the other
operations are not necessary (though some may really need). The real fix should be based on
the ConcurrentHashMap requirements.

The other day when I worked out the patch, it was very reproducible in my env, but now it's
not unfortunately (because I cleaned my build, and the nature of this issue), so I can't verify
whether a new fix resolve the problem. I will keep watching if I can see this issue again.

BTW [~daryn], thanks for your "Do we know what else is modifying bpSlices and causing the
CME? Hopefully we aren't masking another bug." comments. The discussion we had so far is along
this line.

Thanks.

> TestWebHdfsWithMultipleNameNodes failed with ConcurrentModificationException
> ----------------------------------------------------------------------------
>
>                 Key: HDFS-6428
>                 URL: https://issues.apache.org/jira/browse/HDFS-6428
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: webhdfs
>            Reporter: Yongjun Zhang
>            Assignee: Yongjun Zhang
>         Attachments: HDFS-6428.001.patch
>
>
> TestWebHdfsWithMultipleNameNodes failed as follows:
> {code}
> Running org.apache.hadoop.hdfs.web.TestWebHdfsWithMultipleNameNodes
> Tests run: 2, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 8.643 sec <<<
FAILURE! - in org.apache.hadoop.hdfs.web.TestWebHdfsWithMultipleNameNodes
> org.apache.hadoop.hdfs.web.TestWebHdfsWithMultipleNameNodes  Time elapsed: 3.771 sec
 <<< ERROR!
> java.util.ConcurrentModificationException: null
>         at java.util.HashMap$HashIterator.nextEntry(HashMap.java:894)
>         at java.util.HashMap$EntryIterator.next(HashMap.java:934)
>         at java.util.HashMap$EntryIterator.next(HashMap.java:932)
>         at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.shutdown(FsVolumeImpl.java:251)
>         at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList.shutdown(FsVolumeList.java:249)
>         at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.shutdown(FsDatasetImpl.java:1389)
>         at org.apache.hadoop.hdfs.server.datanode.DataNode.shutdown(DataNode.java:1304)
>         at org.apache.hadoop.hdfs.MiniDFSCluster.shutdownDataNodes(MiniDFSCluster.java:1555)
>         at org.apache.hadoop.hdfs.MiniDFSCluster.shutdown(MiniDFSCluster.java:1530)
>         at org.apache.hadoop.hdfs.MiniDFSCluster.shutdown(MiniDFSCluster.java:1514)
>         at org.apache.hadoop.hdfs.web.TestWebHdfsWithMultipleNameNodes.shutdownCluster(TestWebHdfsWithMultipleNameNodes.java:99)
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message