hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wellington Chevreuil (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-11515) -du throws ConcurrentModificationException
Date Wed, 23 Aug 2017 21:59:00 GMT

    [ https://issues.apache.org/jira/browse/HDFS-11515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16139215#comment-16139215
] 

Wellington Chevreuil commented on HDFS-11515:
---------------------------------------------

I believe the problem here is because *DirectoryWithSnapshotFeature.computeContentSummary4Snapshot*
ends up calling *ContentSummaryComputationContext.reportDeletedSnapshottedNode* from the iteration
being done over *ContentSummaryComputationContext.deletedSnapshottedNodes* on *ContentSummaryComputationContext.tallyDeletedSnapshottedINodes*.

Here the loop on *ContentSummaryComputationContext.tallyDeletedSnapshottedINodes*:

{noformat}
  for (INode node : deletedSnapshottedNodes) {
      if (!nodeIncluded(node)) {
        node.computeContentSummary(Snapshot.CURRENT_STATE_ID, this);
      }
    }
{noformat}

Where node is an instance of *INodeDirectory*. Notice it's passing "*this*" to *INodeDirectory.computeContentSummary*.

>From *INodeDirectory.computeContentSummary* code below, it will reach *DirectoryWithSnapshotFeature.computeContentSummary4Snapshot*:

{noformat}
   final DirectoryWithSnapshotFeature sf = getDirectoryWithSnapshotFeature();
    if (sf != null && snapshotId == Snapshot.CURRENT_STATE_ID) {
      sf.computeContentSummary4Snapshot(summary);
    }
{noformat}

Inside *DirectoryWithSnapshotFeature.computeContentSummary4Snapshot*, we can see it calls
*ContentSummaryComputationContext.reportDeletedSnapshottedNode* over it's *context* param,
which is the same instance where we are already iterating over the collection.

{noformat}
      for(INode deletedNode : d.getChildrenDiff().getList(ListType.DELETED)) {
        context.reportDeletedSnapshottedNode(deletedNode);
      }
{noformat}

So this is will cause the same *deletedSnapshottedNodes* collection we are currently iterating
to be added another element, which then will give the ConcurrentModificationException on next
call to next. 

A simple fix of iterating over a copy of *deletedSnapshottedNodes* seems to fix this problem.
I had tried this change and couldn't reproduce the error within the steps mentioned on the
jira initial description, on *ContentSummaryComputationContext.tallyDeletedSnapshottedINodes*:

{noformat}
    for (INode node : new HashSet<>(deletedSnapshottedNodes)) {
      if (!nodeIncluded(node)) {
        node.computeContentSummary(Snapshot.CURRENT_STATE_ID, this);
      }
    }
{noformat}

> -du throws ConcurrentModificationException
> ------------------------------------------
>
>                 Key: HDFS-11515
>                 URL: https://issues.apache.org/jira/browse/HDFS-11515
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode, shell
>    Affects Versions: 2.8.0, 3.0.0-alpha2
>            Reporter: Wei-Chiu Chuang
>            Assignee: Istvan Fajth
>         Attachments: HDFS-11515.001.patch, HDFS-11515.002.patch, HDFS-11515.003.patch,
HDFS-11515.004.patch, HDFS-11515.test.patch
>
>
> HDFS-10797 fixed a disk summary (-du) bug, but it introduced a new bug.
> The bug can be reproduced running the following commands:
> {noformat}
> bash-4.1$ hdfs dfs -mkdir /tmp/d0
> bash-4.1$ hdfs dfsadmin -allowSnapshot /tmp/d0
> Allowing snaphot on /tmp/d0 succeeded
> bash-4.1$ hdfs dfs -touchz /tmp/d0/f4
> bash-4.1$ hdfs dfs -mkdir /tmp/d0/d1
> bash-4.1$ hdfs dfs -createSnapshot /tmp/d0 s1
> Created snapshot /tmp/d0/.snapshot/s1
> bash-4.1$ hdfs dfs -mkdir /tmp/d0/d1/d2
> bash-4.1$ hdfs dfs -mkdir /tmp/d0/d1/d3
> bash-4.1$ hdfs dfs -mkdir /tmp/d0/d1/d2/d4
> bash-4.1$ hdfs dfs -mkdir /tmp/d0/d1/d3/d5
> bash-4.1$ hdfs dfs -createSnapshot /tmp/d0 s2
> Created snapshot /tmp/d0/.snapshot/s2
> bash-4.1$ hdfs dfs -rmdir /tmp/d0/d1/d2/d4
> bash-4.1$ hdfs dfs -rmdir /tmp/d0/d1/d2
> bash-4.1$ hdfs dfs -rmdir /tmp/d0/d1/d3/d5
> bash-4.1$ hdfs dfs -rmdir /tmp/d0/d1/d3
> bash-4.1$ hdfs dfs -du -h /tmp/d0
> du: java.util.ConcurrentModificationException
> 0 0 /tmp/d0/f4
> {noformat}
> A ConcurrentModificationException forced du to terminate abruptly.
> Correspondingly, NameNode log has the following error:
> {noformat}
> 2017-03-08 14:32:17,673 WARN org.apache.hadoop.ipc.Server: IPC Server handler 4 on 8020,
call org.apache.hadoop.hdfs.protocol.ClientProtocol.getContentSumma
> ry from 10.0.0.198:49957 Call#2 Retry#0
> java.util.ConcurrentModificationException
>         at java.util.HashMap$HashIterator.nextEntry(HashMap.java:922)
>         at java.util.HashMap$KeyIterator.next(HashMap.java:956)
>         at org.apache.hadoop.hdfs.server.namenode.ContentSummaryComputationContext.tallyDeletedSnapshottedINodes(ContentSummaryComputationContext.java:209)
>         at org.apache.hadoop.hdfs.server.namenode.INode.computeAndConvertContentSummary(INode.java:507)
>         at org.apache.hadoop.hdfs.server.namenode.FSDirectory.getContentSummary(FSDirectory.java:2302)
>         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getContentSummary(FSNamesystem.java:4535)
>         at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getContentSummary(NameNodeRpcServer.java:1087)
>         at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getContentSummary(AuthorizationProviderProxyClientProtocol.java:5
> 63)
>         at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getContentSummary(ClientNamenodeProtocolServerSideTranslatorPB.jav
> a:873)
>         at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>         at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2216)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2212)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:415)
>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2210)
> {noformat}
> The bug is due to a improper use of HashSet, not concurrent operations. Basically, a
HashSet can not be updated while an iterator is traversing it.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message