cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Benedict (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (CASSANDRA-14660) Improve TokenMetaData cache populating performance for large cluster
Date Thu, 23 Aug 2018 22:27:00 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-14660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16590859#comment-16590859
] 

Benedict edited comment on CASSANDRA-14660 at 8/23/18 10:26 PM:
----------------------------------------------------------------

bq. In HasMultimap the value collection is a HashSet, which does not work for us. E.g getting
tokens for a specific endpoints will not give us sorted token anymore.

{{MultimapBuilder}} can be used to construct a {{Multimap}} with hashed keys and {{TreeSet}}
values.  We would not want to use the {{build(Multimap<K, V>)}} method, as it invokes
{{put(K, V)}} for each entry, but if we instead iterated over {{asMap().entrySet()}} and added
each {{TreeSet}} individually, we should get the linear complexity construction of the {{TreeSet}}
containers.

Thinking about it, it's probable this change alone (iterating over {{asMap().entrySet()}})
would be sufficient to restore adequate performance, even if we continued to maintain a {{TreeMap}}
for the key lookup (i.e. continued to use a {{TreeMultimap}}).  This should be a 3 line change,
in {{SortedBiMultiValMap}} and tide us over until we can properly overhaul {{TokenMetadata}},
and have no wider semantic implications to worry about.


was (Author: benedict):
bq. In HasMultimap the value collection is a HashSet, which does not work for us. E.g getting
tokens for a specific endpoints will not give us sorted token anymore.

{{MultimapBuilder}} can be used to construct a {{Multimap}} with hashed keys and {{TreeSet}}
values.  We would not want to use the {{build(Multimap<K, V>)}} method, as it invokes
{{put(K, V)}} for each entry, but if we instead iterated over {{asMap()}} and added each {{TreeSet}}
individually, we should get the linear complexity construction of the {{TreeSet}} containers.

Thinking about it, it's probable this change alone would be sufficient to restore adequate
performance, even if we continued to maintain a {{TreeMap}} for the key lookup (i.e. continued
to use a TreeMultimap).  This should be a 3 line change, in {{SortedBiMultiValMap}} and tide
us over until we can properly overhaul {{TokenMetadata}}, and have no wider semantic implications
to worry about.

> Improve TokenMetaData cache populating performance for large cluster
> --------------------------------------------------------------------
>
>                 Key: CASSANDRA-14660
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-14660
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Coordination
>         Environment: Benchmark is on MacOSX 10.13.5, 2017 MBP
>            Reporter: Pengchao Wang
>            Priority: Critical
>              Labels: Performance
>             Fix For: 3.0.x, 3.11.x, 4.x
>
>         Attachments: 14660-trunk.txt, TokenMetaDataBenchmark.java
>
>
> TokenMetaData#cachedOnlyTokenMap is a method C* used to get a consistent token and topology
view on coordinations without paying read lock cost. Upon first read the method acquire a
synchronize lock and generate a copy of major token meta data structures and cached it, and
upon every token meta data changes(due to gossip changes), the cache get cleared and next
read will taking care of cache population.
> For small to medium size clusters this strategy works pretty well. But large clusters
can actually be suffered from the locking since cache populating is much slower. On one of
our largest cluster (~1000 nodes,  125k tokens, C* 3.0.15)  each cache population take
about 500~700ms, and during that there are no requests can go through since synchronize lock
was acquired. This caused waves of timeouts errors when there are large amount gossip messages
propagating cross the cluster, such as in the case of cluster restarting.
> Base on profiling we found that the cost mostly comes from copying tokenToEndpointMap.
It is a SortedBiMultiValueMap made from a forward map use TreeMap and a reverse map use guava
TreeMultiMap. There is an optimization in TreeMap helps reduce copying complexity from O(N*log(N)) to
O(N) when copying from already ordered data. But guava's TreeMultiMap copying missed that
optimization and make it ~10 times slower than it actually need to be on our size of cluster.
> The patch attached to the issue replace the reverse TreeMultiMap<K, V> to a vanilla
TreeMap<K, TreeSet<V>> in SortedBiMultiValueMap to make sure we can copy it O(N)
time.
> I also attached a benchmark script (TokenMetaDataBenchmark.java), which simulates a large
cluster then measures average latency for TokenMetaData cache populating.
> Benchmark result before and after that patch:
> {code:java}
> trunk: 
> before 100ms, after 13ms
> 3.0.x: 
> before 199ms, after 15ms
>  {code}
> (On 3.0.x even the forward TreeMap copying is slow, the O(N*log(N)) to O(N) optimization
is not applied because the key comparator is dynamically created and TreeMap cannot determine
the source and dest are in same order)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org


Mime
View raw message