cassandra-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Владимир Бухтояров <jseco...@mail.ru.INVALID>
Subject Re[3]: Histogram error "Unable to compute ceiling for max when histogram overflowed"
Date Thu, 20 Oct 2016 15:06:04 GMT
I have investigated the problem and found that monitoring was serriously changed since 3.7(version
when I got exception in com.codahale.metrics.servlets.MetricsServlet). Since version 3.9 it
is enough to change behavior of DecayingEstimatedHistogramReservoir, the EstimatedHistogram
should stay unchanged. The modification of DecayingEstimatedHistogramReservoir will be safe,
because in opposite to EstimatedHistogram, the DecayingEstimatedHistogramReservoir is not
used for Cassandra internal needs.

Also I found very strange resolution of  issue  CASSANDRA-12185 - the nothing done to prevent
of IllegalStateException, but issue is closed. Should I reopen #12185 or deliver pull request
in new issue?


Best regards,
Bukhtoyarov Vladimir
email jsecoder@mail.ru
skype live:fanat-tdd
Github: https://github.com/vladimir-bukhtoyarov
mobile +79618096798

>Среда, 19 октября 2016, 21:12 +03:00 от Владимир Бухтояров
<jsecoder@mail.ru.INVALID>:
>
>The null(zero) values of snapshot are useless for problem analysing, because it is impossible
to distinguishing case when there are no events from case when events were dispatched too
slow. I do not see any criminal to return  999-th percentile as 3h when histogram configured
with 3h max and any latency is 4h.
>
>
>Best regards,
>Bukhtoyarov Vladimir
>email  jsecoder@mail.ru
>skype live:fanat-tdd
>Github:  https://github.com/vladimir-bukhtoyarov
>mobile  +79618096798
>
>>Среда, 19 октября 2016, 20:17 +03:00 от Ken Hancock < ken.hancock@schange.com
>:
>>
>>I would suggest metrics should return null values instead of false values.
>>
>>On Wed, Oct 19, 2016 at 12:21 PM, Владимир Бухтояров <
>> jsecoder@mail.ru.invalid > wrote:
>>
>>>
>>> Hi to all,
>>>
>>> I want to fix  https://issues.apache.org/jira/browse/CASSANDRA-11063
>>> This issue is very ugly for me, because when something works slow then it
>>> is impossible to capture metrics and save it to monitoring database for
>>> future investigation. Moreover when one histogram throw exception then many
>>> metrics-exporters are unable to export metrics for whole MetricRegistry(for
>>> example MetricsServlet), so when overflow happen in one histogram then I
>>> have no history data at all.
>>>
>>> I propose to implement the following changes:
>>> 1. The DecayingEstimatedHistogramReservoir and EstimatedHistogram will
>>> return maximum trackable value instead of Long.MAX_VALUE
>>> 2. The DecayingEstimatedHistogramReservoir and EstimatedHistogram will
>>> never throw IllegalStateException, instead, it will use maximum trackable
>>> value as regular value in percentile and average calculation.
>>> 3.  If anybody want to save old behavior(prefer to crash instead of
>>> inaccurate reporting) then I can add configuration parameter to save
>>> previous behavior, moreover I can leave old behavior as default, for my
>>> needs it will be enough to have some option to avoid crashes.
>>>
>>>
>>> Best regards,
>>> Bukhtoyarov Vladimir
>>> email  jsecoder@mail.ru
>>> skype live:fanat-tdd
>>> Github:  https://github.com/vladimir-bukhtoyarov
>>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message