hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HDFS-988) saveNamespace can corrupt edits log, apparently due to race conditions
Date Thu, 02 Jun 2011 05:25:47 GMT

     [ https://issues.apache.org/jira/browse/HDFS-988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Todd Lipcon updated HDFS-988:
-----------------------------

    Attachment: 988-fixups.txt

Attaching a few fixups on top of hdfs-988-5.patch, related to the below comments:


Regarding the question about computeDatanodeWork/heartbeatCheck:

computeDatanodeWork calls blockManager.computeReplicationWork and blockManager.computeInvalidateWork.
In the case of computeReplicationWork, it might schedule some replications. This seems OK
- worst case we get some extra replicas which will get fixed up later. In the case of computeInvalidateWork,
it calls invalidateWorkForOneNode which takes the write lock and then checks safe mode before
scheduling any deletions.

In heartbeatCheck, I think we can simply put another "if (isInSafeMode()) return" in right
after it takes the writeLock if it finds a dead node. That way if it races, it still doesn't
take any actions based on it. Either way, I don't think this could corrupt anything since
it won't write to the edit log.


Some other notes:
- isLockedReadOrWrite should be checking this.fsLock.getReadHoldCount() rather than getReadLockCount()
- FSDirectory#bLock says it protects the block map, but it also protects the directory, right?
we should update the comment and perhaps the name.
- various functions don't take the read lock because they call functions in FSDirectory that
take FSDirectory.bLock. This seems incorrect, since, for example, getListing() racing against
open() with overwrite=true could return the directory with the file deleted but the new one
not there yet. I guess what's confusing me is that it's not clear why some functions don't
need readLock when they perform read operations. When is just the FSDirectory lock sufficient?
It looks like a lot of the test failures above are due to this.
- handleHeartbeat calls getDatanode() while only holding locks on heartbeats and datanodeMap,
but registerDatanode mutates datanodeMap without locking either.
- getDataNodeInfo seems like an unused function with no locking - can we remove it?
- several other places access datanodeMap with synchronization on that object itself. unprotectedAddDatanode
should assert it holds that monitor lock
- when loading the edit log, why doesn't loadFSEdits take a write lock on the namesystem before
it starts? then we could add all of the asserts and not worry about it.


- it looks like saving the image no longer works, since saveFilesUnderConstruction now takes
the readLock, but it's being called by a different thread than took the write lock in saveNamespace.
So, it deadlocks. At first I thought this could be solved by just making saveNamespace take
a read lock instead of write lock, but that actually doesn't work due to fairness -- what
can happen is that saveNamespace takes readLock, then some other thread comes along and queues
up for the write lock. At the point, no further readers are allowed to take the read lock,
because it's a fair lock. So, the image-writer thread locks up.


Optimizations to address later:
- When create() is called with the overwrite flag true, that calls delete() which will logSync()
while holding the lock. We can hold off on fixing it since it's a performance problem, not
correctness, and the operation is fairly rare.
- getAdditionalBlock doesn't logSync() - I think there's another issue pending about that
since it will affect HA. Let's address later.
- checkFileProgress doesn't really need the write lock
- seems like saveNamespace could safely just take the read lock to allow other readers to
keep working


Nits:
- Typo: "Cnnot concat"
- rollEditLog has comment saying "Checkpoint not created"
- rollFSImage has the same issue, but at least has to do with checkpoints, so could be correct


> saveNamespace can corrupt edits log, apparently due to race conditions
> ----------------------------------------------------------------------
>
>                 Key: HDFS-988
>                 URL: https://issues.apache.org/jira/browse/HDFS-988
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: name-node
>    Affects Versions: 0.20-append, 0.21.0, 0.22.0
>            Reporter: dhruba borthakur
>            Assignee: Eli Collins
>            Priority: Blocker
>             Fix For: 0.20-append, 0.22.0
>
>         Attachments: 988-fixups.txt, HDFS-988_fix_synchs.patch, hdfs-988-2.patch, hdfs-988-3.patch,
hdfs-988-4.patch, hdfs-988-5.patch, hdfs-988-b22-1.patch, hdfs-988.txt, saveNamespace.txt,
saveNamespace_20-append.patch
>
>
> The adminstrator puts the namenode is safemode and then issues the savenamespace command.
This can corrupt the edits log. The problem is that  when the NN enters safemode, there could
still be pending logSycs occuring from other threads. Now, the saveNamespace command, when
executed, would save a edits log with partial writes. I have seen this happen on 0.20.
> https://issues.apache.org/jira/browse/HDFS-909?focusedCommentId=12828853&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12828853

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message