zookeeper-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Enrico Olivelli (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ZOOKEEPER-1936) Server exits when unable to create data directory due to race
Date Mon, 29 May 2017 09:59:04 GMT

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16028186#comment-16028186
] 

Enrico Olivelli commented on ZOOKEEPER-1936:
--------------------------------------------

this issue is marked as fixversion = 3.5.4, the PR #75 has been closed.
I cannot find commits in 3.5 branch
which is the actual status ?

I am very interested in this fix

> Server exits when unable to create data directory due to race 
> --------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1936
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1936
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.4.6, 3.5.0
>            Reporter: Harald Musum
>            Assignee: Ted Yu
>            Priority: Minor
>             Fix For: 3.5.4, 3.6.0
>
>         Attachments: ZOOKEEPER-1936.branch-3.4.patch, ZOOKEEPER-1936.patch, ZOOKEEPER-1936.v2.patch,
ZOOKEEPER-1936.v3.patch, ZOOKEEPER-1936.v3.patch, ZOOKEEPER-1936.v4.patch, ZOOKEEPER-1936.v5.patch
>
>
> We sometime see issues with ZooKeeper server not starting and seeing this error in the
log:
> [2014-05-27 09:29:48.248] ERROR   : -               
> .org.apache.zookeeper.server.ZooKeeperServerMain    Unexpected exception,
> exiting abnormally\nexception=\njava.io.IOException: Unable to create data
> directory /home/y/var/zookeeper/version-2\n\tat
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.<init>(FileTxnSnapLog.java:85)\n\tat
> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)\n\tat
> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)\n\tat
> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)\n\tat
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)\n\tat
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)\n\t
> [...]
> Stack trace from JVM gives this:
> "PurgeTask" daemon prio=10 tid=0x000000000201d000 nid=0x1727 runnable
> [0x00007f55d7dc7000]
>    java.lang.Thread.State: RUNNABLE
>     at java.io.UnixFileSystem.createDirectory(Native Method)
>     at java.io.File.mkdir(File.java:1310)
>     at java.io.File.mkdirs(File.java:1337)
>     at
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.<init>(FileTxnSnapLog.java:84)
>     at org.apache.zookeeper.server.PurgeTxnLog.purge(PurgeTxnLog.java:68)
>     at
> org.apache.zookeeper.server.DatadirCleanupManager$PurgeTask.run(DatadirCleanupManager.java:140)
>     at java.util.TimerThread.mainLoop(Timer.java:555)
>     at java.util.TimerThread.run(Timer.java:505)
> "zookeeper server" prio=10 tid=0x00000000027df800 nid=0x1715 runnable
> [0x00007f55d7ed8000]
>    java.lang.Thread.State: RUNNABLE
>     at java.io.UnixFileSystem.createDirectory(Native Method)
>     at java.io.File.mkdir(File.java:1310)
>     at java.io.File.mkdirs(File.java:1337)
>     at
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.<init>(FileTxnSnapLog.java:84)
>     at
> org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:103)
>     at
> org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86)
>     at
> org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52)
>     at
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116)
>     at
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
> [...]
> So it seems that when autopurge is used (as it is in our case), it might happen at the
same time as starting the server itself. In FileTxnSnapLog() it will check if the directory
exists and create it if not. These two tasks do this at the same time, and mkdir fails and
server exits the JVM.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message