cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tyler Hobbs <ty...@datastax.com>
Subject Re: Slow down of secondary index query with VNODE (C* version 1.2.18, jre6).
Date Fri, 19 Sep 2014 22:28:43 GMT
On Fri, Sep 19, 2014 at 4:53 PM, Jay Patel <pateljay3001@gmail.com> wrote:

>
> When coordinator fires indexed scan request to node 192.168.51.22, why
> don't it ask that node to check all of its (at least primary) ranges for
> the queried data, at once. Also, internally that node should be able to
> just do one scan through all of the ranges held by it, isn't it?
> (e.g. [min(-9223372036854775808), max(-9193352069377957523), and
> (max(-9136021049555745100), max(-8959555493872108621)], etc. ]
>
> Seems like it needs to query data in token order. So,
> min(-9223372036854775808), max(-*9193352069377957523*) on 192.168.51.22.
> But next range ((max(-*9193352069377957523*), max(-*9136021049555745100*)])
> is on 192.168.51.25 so fire query there. Then, next range  (max(-
> *9136021049555745100*), max(-8959555493872108621)] again on
> 192.168.51.22. Btw,, I'm not too sure regarding min/max or max/max in trace
> output.
>

The coordinator certainly could batch multiple range requests that are
going to the same replica.  It's an optimization that would primarily help
the empty table/high cardinality case, but you're welcome to open a
ticket.  3.0 is the earliest this would make it in.


>
> I found below comment in
> https://issues.apache.org/jira/browse/CASSANDRA-4858.
> "The problem is that we have to scan the nodes in token order so we dont
> break the existing API's, if we do so then we are sending a lot more
> requests and waiting for the response than the number of nodes. "
> Don't understand the restriction though - "don't break the existing API's".
>

I think he's just saying that we have to make sure we return results in
token order (and if there's a limit on the query, return the first N
results when listed in token order).


>
> With non-vnode, it only queries a particular node only one time..Btw, in
> the worst case, I understand secondary index query has to scan all the
> nodes in cluster sometime (empty table or high cardinality index?) but I
> don't understand why vnode makes it to scan the *same node *multiple
> times. If RF is 1, then also I see this behavior.
>
> >> Snippet from output1.txt attached earlier:
> Executing indexed scan for [min(-9223372036854775808),
> max(-9193352069377957523)] | 23:11:30,992 | 192.168.51.22 |
> Executing indexed scan for (max(-9193352069377957523),
> max(-9136021049555745100)] | 23:11:30,998 | 192.168.51.25 |
> Executing indexed scan for (max(-9136021049555745100),
> max(-8959555493872108621)] | 23:11:30,999 | 192.168.51.22 |
> Executing indexed scan for (max(-8959555493872108621),
> max(-8929774302283364912)] | 23:11:31,000 | 192.168.51.25 |
>

I'm not sure how your question here is different from the one above.




-- 
Tyler Hobbs
DataStax <http://datastax.com/>

Mime
View raw message