hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Konstantin Shvachko (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HDFS-1597) Batched edit log syncs can reset synctxid throw assertions
Date Mon, 07 Feb 2011 22:01:57 GMT

    [ https://issues.apache.org/jira/browse/HDFS-1597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12991646#comment-12991646

Konstantin Shvachko commented on HDFS-1597:

The patch needs to be updated.
I don't see where {{saveNamespace()}} calls {{logSyncAll()}}. {{logSyncAll()}} is called only
by {{enterSafeMode()}}.

The main problem seems to be that {{logSync()}} does not hold writer lock. So in the race
with {{saveNamespace()}} it can kick in at any time. The only way to prevent inconsistencies
is to make sure all threads waiting to {{logSync()}} have everything synced already.
In other words all transactions that started before {{saveNamespace()}} grabbed the write
lock should complete, and no new transactions should be allowed to start while {{saveNamespace()}}
is in progress.
So {{saveNamespace()}} must call {{logSyncAll()}} before doing anything with the image or

Therefore, moving the assert down is absolutely correct, imo. If a thread sees that it's transaction
is synced, it should not touch edit streams.

> Batched edit log syncs can reset synctxid throw assertions
> ----------------------------------------------------------
>                 Key: HDFS-1597
>                 URL: https://issues.apache.org/jira/browse/HDFS-1597
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 0.22.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Blocker
>             Fix For: 0.22.0
>         Attachments: hdfs-1597.txt, illustrate-test-failure.txt
> The top of FSEditLog.logSync has the following assertion:
> {code}
>         assert editStreams.size() > 0 : "no editlog streams";
> {code}
> which should actually come after checking to see if the sync was already batched in by
another thread.
> This is related to a second bug in which the same case causes synctxid to be reset to

This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message