mesos-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Benjamin Bannier (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (MESOS-9227) `Value::Scalar` cannot handle large floating point calculation due to fixed point conversion.
Date Wed, 12 Sep 2018 21:29:00 GMT

    [ https://issues.apache.org/jira/browse/MESOS-9227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16612634#comment-16612634
] 

Benjamin Bannier edited comment on MESOS-9227 at 9/12/18 9:28 PM:
------------------------------------------------------------------

I believe to some degree the way our fixed point math truncates away small fractions has prevented
exactness issues for smaller values.

Since scalar resource values are stored as {{double}} internally in the {{Resource}} message,
they can only hold around 15 significant digits. We want to guarantee correct fixed point
math with up to three decimal places, so we can represent values exactly up to around 10¹²
kB = 0.1 PB.

Such an amount of {{disk}} is unfortunately not unrealistic even for a single agent where
we might already run into correctness issues, but it should be possible to e.g., warn users
that agent resources might not be representable. The issue is worse if the total capacity
of {{disk}} in the cluster reaches petabyte scale (either with some agents with huge, but
representable disks, or many agents with considerable disks). The sum of {{disk}} might be
not representable in the master, but would be below the obviously problematic threshold for
each agent, making it harder to diagnose such issues.

A possible short term mitigation might be to store disk resources in GB instead of kB which
would by us a couple magnitudes at the cost of being unable to represent values less than
around 1 MB.


was (Author: bbannier):
I believe to some degree the way our fixed point math truncates away small fractions has prevented
exactness issues for smaller values.

Since scalar resource values are stored as {{double}} internally in the {{Resource}} message,
they can only hold around 15 significant digits. We want to guarantee correct fixed point
math with up to three decimal places, so we can represent values exactly up to around 10¹¹
kB = 0.1 PB.

Such an amount of {{disk}} is unfortunately not unrealistic even for a single agent where
we might already run into correctness issues, but it should be possible to e.g., warn users
that agent resources might not be representable. The issue is worse if the total capacity
of {{disk}} in the cluster reaches petabyte scale (either with some agents with huge, but
representable disks, or many agents with considerable disks). The sum of {{disk}} might be
not representable in the master, but would be below the obviously problematic threshold for
each agent, making it harder to diagnose such issues.

A possible short term mitigation might be to store disk resources in GB instead of kB which
would by us a couple magnitudes at the cost of being unable to represent values less than
around 1 MB.

> `Value::Scalar` cannot handle large floating point calculation due to fixed point conversion.
> ---------------------------------------------------------------------------------------------
>
>                 Key: MESOS-9227
>                 URL: https://issues.apache.org/jira/browse/MESOS-9227
>             Project: Mesos
>          Issue Type: Bug
>            Reporter: Meng Zhu
>            Priority: Blocker
>
> While `scalar` holds a `double`, internally we convert floating point to fixed point
to ensure only three decimal digits:
> https://github.com/apache/mesos/blob/851ec9c5dca672ed4efc77545c86121463695e4f/src/common/values.cpp#L48-L53
> And all internal arithmetic calculations are done using `long long`, e.g.:
> https://github.com/apache/mesos/blob/851ec9c5dca672ed4efc77545c86121463695e4f/src/common/values.cpp#L123-L128
> This has the unexpected consequence of the inability to handle large values. One impacted
use case we are seeing is with exabytes of disks. This will overflow the fixed point representation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message