zookeeper-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ZOOKEEPER-2789) Reassign `ZXID` for solving 32bit overflow problem
Date Fri, 16 Jun 2017 07:44:00 GMT

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-2789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16051530#comment-16051530
] 

ASF GitHub Bot commented on ZOOKEEPER-2789:
-------------------------------------------

Github user yunfan123 commented on the issue:

    https://github.com/apache/zookeeper/pull/262
  
    Hi, @asdf2014 
    In most cases, I don't think the epoch can overflow 16-bit.
    In general, zookeeper leader election is very rare, and it may take several seconds even
several minutes to finish leader election.
    And zookeeper is totally unavailable during leader election.
    If the zookeeper that you use can overflow 16-bits, it turns out the zookeeper you used
is totally unreliable.
    Finally, compatible with old version is really important.
    If not compatible with old versions, I must restart all my zookeeper nodes. 
    All of nodes need reload snapshot and log from disk, it will cost a lot of time.
    I believe this upgrade process is unacceptable by most zookeeper users.



> Reassign `ZXID` for solving 32bit overflow problem
> --------------------------------------------------
>
>                 Key: ZOOKEEPER-2789
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2789
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: quorum
>    Affects Versions: 3.5.3
>            Reporter: Benedict Jin
>            Assignee: Benedict Jin
>             Fix For: 3.6.0
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> If it is `1k/s` ops, then as long as $2^32 / (86400 * 1000) \approx 49.7$ days ZXID will
exhausted. But, if we reassign the `ZXID` into 16bit for `epoch` and 48bit for `counter`,
then the problem will not occur until after  $Math.min(2^16 / 365, 2^48 / (86400 * 1000 *
365)) \approx Math.min(179.6, 8925.5) = 179.6$ years.
> However, i thought the ZXID is `long` type, reading and writing the long type (and `double`
type the same) in JVM, is divided into high 32bit and low 32bit part of the operation, and
because the `ZXID` variable is not  modified with `volatile` and is not boxed for the corresponding
reference type (`Long` / `Double`), so it belongs to [non-atomic operation] (https://docs.oracle.com/javase/specs/jls/se8
/html/jls-17.html#jls-17.7). Thus, if the lower 32 bits of the upper 32 bits are divided into
the entire 32 bits of the `long`, there may be a concurrent problem.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message