zookeeper-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ZOOKEEPER-2789) Reassign `ZXID` for solving 32bit overflow problem
Date Thu, 14 Dec 2017 06:28:00 GMT

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-2789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16290405#comment-16290405
] 

ASF GitHub Bot commented on ZOOKEEPER-2789:
-------------------------------------------

Github user asdf2014 commented on the issue:

    https://github.com/apache/zookeeper/pull/262
  
    Hi, @phunt . Indeed, the `FastLeaderElection` algorithm is very efficient. Most of the
leader election situation would finished in hundreds milliseconds. However, some real-time
stream frameworks suck as Apache Kafka and Apache Storm etc, could make lots of pressures
into Zookeeper cluster when they carry on too many business data or processing logic. So maybe,
the leader election will be triggered very frequently and the process becomes time consuming.


> Reassign `ZXID` for solving 32bit overflow problem
> --------------------------------------------------
>
>                 Key: ZOOKEEPER-2789
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2789
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: quorum
>    Affects Versions: 3.5.3
>            Reporter: Benedict Jin
>            Assignee: Benedict Jin
>             Fix For: 3.6.0
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> If it is `1k/s` ops, then as long as $2^32 / (86400 * 1000) \approx 49.7$ days ZXID will
exhausted. But, if we reassign the `ZXID` into 16bit for `epoch` and 48bit for `counter`,
then the problem will not occur until after  $Math.min(2^16 / 365, 2^48 / (86400 * 1000 *
365)) \approx Math.min(179.6, 8925.5) = 179.6$ years.
> However, i thought the ZXID is `long` type, reading and writing the long type (and `double`
type the same) in JVM, is divided into high 32bit and low 32bit part of the operation, and
because the `ZXID` variable is not  modified with `volatile` and is not boxed for the corresponding
reference type (`Long` / `Double`), so it belongs to [non-atomic operation] (https://docs.oracle.com/javase/specs/jls/se8
/html/jls-17.html#jls-17.7). Thus, if the lower 32 bits of the upper 32 bits are divided into
the entire 32 bits of the `long`, there may be a concurrent problem.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message