ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Denis Magda <dma...@apache.org>
Subject Re: Cache Metrics
Date Mon, 24 Jul 2017 17:48:18 GMT

What if we calculate it on both sides? The client will keep the total time needed to complete
an operation including network hoops while a server (primary or backup) will count only local


> On Jul 17, 2017, at 7:07 AM, Andrey Gura <agura@apache.org> wrote:
> Hi,
> I believe that the first solution is better than second because it
> takes into account network communication time. Average time of
> communication between nodes doesn't make sense from my point of view.
> So I vote for #1.
> On Thu, Jul 13, 2017 at 11:52 PM, Вячеслав Коптилин
> <slava.koptilin@gmail.com> wrote:
>> Hi Experts,
>> I am working on https://issues.apache.org/jira/browse/IGNITE-3495
>> A few words about this issue:
>> It is about that the process of gathering/updating of cache metrics is
>> inconsistent in some cases.
>> Let's consider the following simple topology which contains only two nodes:
>> first node is a client node and the second is a server.
>> And client node starts requests to the server node, for instance
>> cache.put(), cache.putAll(), cache.get() etc.
>> In that case, metrics which are related to counters (cache hits, cache
>> misses, removals and puts) are calculated on the server side,
>> while time metrics are updated on the client node.
>> I think that both metrics (counters and time) should be calculated on the
>> same node. So, there are two obvious solution:
>> #1 Node that starts some operation is responsible for updating the cache
>> metrics.
>> Pro:
>> - it will allow to get more accurate results of metrics.
>> Contra:
>> - this approach does not work in particular cases. for example, partitioned
>> cache with FULL_ASYNC write synchronization mode.
>> - needs to extend response messages (GridNearAtomicUpdateResponse,
>> GridNearGetResponse etc)
>>  in order to provide additional information from remote node: cache hits,
>> number of removal etc.
>>  So, it will lead to additional pressure on communication channel.
>> Perhaps, this impact will be small - 4 bytes per message or something like
>> that.
>> - backward incompatibility (this is a consequence of the previous point)
>> #2 Primary node (node that actually executes a request)
>> Pro:
>> - easy to implement
>> - backward compatible
>> Contra:
>> - time metrics will not include the time of communication between nodes, so
>> the results will be less accurate.
>> - perhaps we need to provide additional metric which will allow to get avg
>> time of communication between nodes.
>> Please let me know about your thoughts.
>> Perhaps, both alternatives are not so good...
>> Regards,
>> Slava.

View raw message