Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: hdfs-issues@hadoop.apache.org
Date: Thu, 2 Jun 2011 00:10:47 +0000 (UTC)
From: "Eli Collins (JIRA)" <jira@apache.org>
To: hdfs-issues@hadoop.apache.org
Message-ID: 
 <163335990.61371.1306973447583.JavaMail.tomcat@hel.zones.apache.org>
Subject: [jira] [Updated] (HDFS-988) saveNamespace can corrupt edits log,
 apparently due to race conditions
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable


     [ https://issues.apache.org/jira/browse/HDFS-988?page=3Dcom.atlassian.=
jira.plugin.system.issuetabpanels:all-tabpanel ]

Eli Collins updated HDFS-988:
-----------------------------

    Attachment: hdfs-988-5.patch

Thanks for taking a look Todd. Updated patch attached.

bq. checks for if (auditLog.isInfoEnabled()) should probably now be (auditL=
og.isInfoEnabled() && isExternalInvocation()) =E2=80=93 otherwise we're doi=
ng a needless directory traversal for fsck

Fixed.

bq. The following methods currently do logSync() while holding the writeLoc=
k, which is expensive:

Fixed. (Only one needed to conditionally call logSync)

bq. seems strange that some of the xInternal() methods take the write lock =
themselves (eg setReplicationInternal) whereas others assume the caller tak=
es the write lock (eg createSymlinkInternal). We should be consistent.

Latest patch makes them more consistent, I also refactored out a couple new=
 xInternal methods. In a couple places (eg deleteInternal and getListing) I=
 didn't hoist up the locking because it would make the locking too coarse-g=
rain (eg would result in syncing the log w/ the lock held).

bq. for those methods that don't explicitly take the write lock, we should =
either add an assert hasWriteLock() or a comment explaining why the lock is=
 not necessary (eg internalReleaseLease, reassignLease, finalizeINodeFileUn=
derConstruction)

Done. For FSDirectory I made the unprotectedX methods actually unprotected =
and moved the locking to the caller (except for FSEditLogLoader which calls=
 the unprotected methods directly on purpose - I doubt this really saves us=
 that much). These methods (per their name) are now intentionally unprotect=
ed.=20

bq. comment for endCheckpoint says "not started" but should say "not ended"=
.  same with updatePipeline.

Both fixed.

bq. why doesn't getListing need the read lock?

Because its callees (check*, getListing) take the lock.

bq. I noticed that nextGenerationStamp() doesn't logSync() =E2=80=93 that s=
eems dangerous, since after a restart we might hand out a duplicate genstam=
p.

Good catch. I made sure all callers sync the log (this was only missing fro=
m the updateBlockForPipeline path). nextGenerationStamp is always called wi=
th the lock held so I asserted that and removed the lock aquisition from th=
is method.

> saveNamespace can corrupt edits log, apparently due to race conditions
> ----------------------------------------------------------------------
>
>                 Key: HDFS-988
>                 URL: https://issues.apache.org/jira/browse/HDFS-988
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: name-node
>    Affects Versions: 0.20-append, 0.21.0, 0.22.0
>            Reporter: dhruba borthakur
>            Assignee: Eli Collins
>            Priority: Blocker
>             Fix For: 0.20-append, 0.22.0
>
>         Attachments: HDFS-988_fix_synchs.patch, hdfs-988-2.patch, hdfs-98=
8-3.patch, hdfs-988-4.patch, hdfs-988-5.patch, hdfs-988.txt, saveNamespace.=
txt, saveNamespace_20-append.patch
>
>
> The adminstrator puts the namenode is safemode and then issues the savena=
mespace command. This can corrupt the edits log. The problem is that  when =
the NN enters safemode, there could still be pending logSycs occuring from =
other threads. Now, the saveNamespace command, when executed, would save a =
edits log with partial writes. I have seen this happen on 0.20.
> https://issues.apache.org/jira/browse/HDFS-909?focusedCommentId=3D1282885=
3&page=3Dcom.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#a=
ction_12828853

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira