cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Igor Zubchenok (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (CASSANDRA-10050) Secondary Index Performance Dependent on TokenRange Searched in Analytics
Date Wed, 16 Aug 2017 00:54:00 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-10050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16128163#comment-16128163
] 

Igor Zubchenok edited comment on CASSANDRA-10050 at 8/16/17 12:53 AM:
----------------------------------------------------------------------

Is there any chance to have it resolved in coming releases?

>From my point of view it looks like -architecture design- implementation problem, cause
if indexes are sorted by token, why you cannot just use the binary search to find start token?


was (Author: geagle):
Is there any chance to have it resolved in coming releases?

>From my point of view it looks like -architecture design- implementation problem, cause
if indexes are sorted by token, why you cannot just use the binary search to find start token?

> Secondary Index Performance Dependent on TokenRange Searched in Analytics
> -------------------------------------------------------------------------
>
>                 Key: CASSANDRA-10050
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10050
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Secondary Indexes
>         Environment: Single node, macbook, 2.1.8
>            Reporter: Russell Spitzer
>             Fix For: 4.x
>
>
> In doing some test work on the Spark Cassandra Connector I saw some odd performance when
pushing down range queries with Secondary Index filters. When running the queries we see huge
amount of time when the C* server is not doing any work and the query seem to be hanging.
This investigation led to the work in this document
> https://docs.google.com/spreadsheets/d/1aJg3KX7nPnY77RJ9ZT-IfaYADgJh0A--nAxItvC6hb4/edit#gid=0
> The Spark Cassandra Connector builds up token range specific queries and allows the user
to pushdown relevant fields to C*. Here we have two indexed fields (size) and (color) being
pushed down to C*. 
> {code}
> SELECT count(*) FROM ks.tab WHERE token("store") > $min AND token("store") <= $max
AND color = 'red' AND size = 'P' ALLOW FILTERING;{code}
> These queries will have different token ranges inserted and executed as separate spark
tasks. Spark tasks with token ranges near the Min(token) end up executing much faster than
those near Max(token) which also happen to through errors.
> {code}
> Coordinator node timed out waiting for replica nodes' responses] message="Operation timed
out - received only 0 responses." info={'received_responses': 0, 'required_responses': 1,
'consistency': 'ONE'}
> {code}
> I took the queries and ran them through CQLSH to see the difference in time. A linear
relationship is seen based on where the tokenRange being queried is starting with only 2 second
for queries near the beginning of the full token spectrum and over 12 seconds at the end of
the spectrum. 
> The question is, can this behavior be improved? or should we not recommend using secondary
indexes with Analytics workloads?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org


Mime
View raw message