cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aaron morton <aa...@thelastpickle.com>
Subject Re: Secondary Index, performance , data type
Date Wed, 04 Jul 2012 09:52:18 GMT
> select my_cf where columnA = a and columnB = b and columnC = c and columnD = d
Cassandra will only use one equality clause to select the candidate rows. The other clauses
are applied to the rows using that first clause. 

The clause to use to select candidate rows is based on statistics that estimate the number
of columns in the indexes. 

> Do you have any ideas? is there any way to understand how cassandra internally run the
query (a kind of "explain plan")? 
The only way I know of to see the "query plan" is to set DEBUG logging on org.apache.cassandra.db.index.keys.KeysSearcher
and look for the message "Primary scan clause is "

Note, if this is a common query you may get better performance creating a custom secondary
index than using four equality clauses in an index scan.

> 2/ Is there any limitations on the number of criterias we can usually have? 

None that I know of. Query will probably run slower the more you have. 

> 3/ Even if we have different data type (date, string, int), we have all stored them as
UTF8Type. Could we expect performance improvements if we use DateType, LongType?
No. The main issue is going to be the selectivity of the primary scan clause, followed by
the number of additional clauses. Their types will have very little / no impact.

Hope that helps. 
 
-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 3/07/2012, at 3:59 AM, Olivier Mallassi wrote:

> Hi all
> 
> We have 4 indexed columns; all configured in UT8Type even if one columns is a date and
the other an integer). 
> 
> 1/ the read query we run can have up to 4 criteria
> select my_cf where columnA = a and columnB = b and columnC = c and columnD = d
> 
> This query, is fast (<500ms) up to 3 criterias but when we add the fourth one, the
exection time is 9,5s. 
> Do you have any ideas? is there any way to understand how cassandra internally run the
query (a kind of "explain plan")? 
> 
> 2/ Is there any limitations on the number of criterias we can usually have? 
> 
> 3/ Even if we have different data type (date, string, int), we have all stored them as
UTF8Type. Could we expect performance improvements if we use DateType, LongType?
> 
> Many thx for all your answers. 
> 
> -- 
> ............................................................
> Olivier Mallassi
> OCTO Technology
> ............................................................
> 50, Avenue des Champs-Elysées
> 75008 Paris
> 
> Mobile: (33) 6 28 70 26 61
> Tél: (33) 1 58 56 10 00
> Fax: (33) 1 58 56 10 01
> 
> http://www.octo.com 
> Octo Talks! http://blog.octo.com
> 
> 


Mime
View raw message