hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eric Payne (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-1257) Race condition on FSNamesystem#recentInvalidateSets introduced by HADOOP-5124
Date Wed, 17 Aug 2011 15:56:27 GMT

    [ https://issues.apache.org/jira/browse/HDFS-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13086415#comment-13086415

Eric Payne commented on HDFS-1257:

Hi Nicholas. Thanks for your patience in getting through the reviews of this.

I'm confused as to why 1) you are seeing this error and 2) it is timing out for you. I'm not
seeing that error in my environment. And, as for the timeout, even before when it was taking
3 minutes, it should not have timed out. There are a lot of unit tests that take longer than
3 minutes.

Anyway, as for taking it out, the reason for doing so would be that the test is not sufficient
to thoroughly test the race condition. A unit test just can't stress the namenode in the MiniDFSCluster
enough to exercise this race condition. To hit this race condition, a test must be in a large
cluster with a very active set of DFS actions happening over an extended period of time. There
just isn't enough memory on a single host to create enough DNs in the MiniDFSCluster. And,
even if there were enoubh memory, a unit test should not be running for a very long period.

> Race condition on FSNamesystem#recentInvalidateSets introduced by HADOOP-5124
> -----------------------------------------------------------------------------
>                 Key: HDFS-1257
>                 URL: https://issues.apache.org/jira/browse/HDFS-1257
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: name-node
>    Affects Versions: 0.23.0
>            Reporter: Ramkumar Vadali
>            Assignee: Eric Payne
>             Fix For: 0.23.0
>         Attachments: HDFS-1257.1.20110810.patch, HDFS-1257.2.20110812.patch, HDFS-1257.3.20110815.patch,
HDFS-1257.4.20110816.patch, HDFS-1257.patch
> HADOOP-5124 provided some improvements to FSNamesystem#recentInvalidateSets. But it introduced
unprotected access to the data structure recentInvalidateSets. Specifically, FSNamesystem.computeInvalidateWork
accesses recentInvalidateSets without read-lock protection. If there is concurrent activity
(like reducing replication on a file) that adds to recentInvalidateSets, the name-node crashes
with a ConcurrentModificationException.

This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message