ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrey Gura <ag...@apache.org>
Subject Re: Cache Metrics
Date Tue, 25 Jul 2017 13:53:15 GMT

doesn't make sense from my point if view. And we create new problem:
how should we aggregate this metrics when user requests metrics for
cluster group.

On Mon, Jul 24, 2017 at 8:48 PM, Denis Magda <dmagda@apache.org> wrote:
> Guys,
> What if we calculate it on both sides? The client will keep the total time needed to
complete an operation including network hoops while a server (primary or backup) will count
only local time.
> —
> Denis
>> On Jul 17, 2017, at 7:07 AM, Andrey Gura <agura@apache.org> wrote:
>> Hi,
>> I believe that the first solution is better than second because it
>> takes into account network communication time. Average time of
>> communication between nodes doesn't make sense from my point of view.
>> So I vote for #1.
>> On Thu, Jul 13, 2017 at 11:52 PM, Вячеслав Коптилин
>> <slava.koptilin@gmail.com> wrote:
>>> Hi Experts,
>>> I am working on https://issues.apache.org/jira/browse/IGNITE-3495
>>> A few words about this issue:
>>> It is about that the process of gathering/updating of cache metrics is
>>> inconsistent in some cases.
>>> Let's consider the following simple topology which contains only two nodes:
>>> first node is a client node and the second is a server.
>>> And client node starts requests to the server node, for instance
>>> cache.put(), cache.putAll(), cache.get() etc.
>>> In that case, metrics which are related to counters (cache hits, cache
>>> misses, removals and puts) are calculated on the server side,
>>> while time metrics are updated on the client node.
>>> I think that both metrics (counters and time) should be calculated on the
>>> same node. So, there are two obvious solution:
>>> #1 Node that starts some operation is responsible for updating the cache
>>> metrics.
>>> Pro:
>>> - it will allow to get more accurate results of metrics.
>>> Contra:
>>> - this approach does not work in particular cases. for example, partitioned
>>> cache with FULL_ASYNC write synchronization mode.
>>> - needs to extend response messages (GridNearAtomicUpdateResponse,
>>> GridNearGetResponse etc)
>>>  in order to provide additional information from remote node: cache hits,
>>> number of removal etc.
>>>  So, it will lead to additional pressure on communication channel.
>>> Perhaps, this impact will be small - 4 bytes per message or something like
>>> that.
>>> - backward incompatibility (this is a consequence of the previous point)
>>> #2 Primary node (node that actually executes a request)
>>> Pro:
>>> - easy to implement
>>> - backward compatible
>>> Contra:
>>> - time metrics will not include the time of communication between nodes, so
>>> the results will be less accurate.
>>> - perhaps we need to provide additional metric which will allow to get avg
>>> time of communication between nodes.
>>> Please let me know about your thoughts.
>>> Perhaps, both alternatives are not so good...
>>> Regards,
>>> Slava.

View raw message