cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Olivier Mallassi <omalla...@octo.com>
Subject Re: Secondary Index, performance , data type
Date Wed, 04 Jul 2012 10:45:37 GMT
Many thx for the explanation Aaron.


On Wednesday, July 4, 2012, aaron morton wrote:

> > select my_cf where columnA = a and columnB = b and columnC = c and
> columnD = d
> Cassandra will only use one equality clause to select the candidate rows.
> The other clauses are applied to the rows using that first clause.
>
> The clause to use to select candidate rows is based on statistics that
> estimate the number of columns in the indexes.
>
> > Do you have any ideas? is there any way to understand how cassandra
> internally run the query (a kind of "explain plan")?
> The only way I know of to see the "query plan" is to set DEBUG logging on
> org.apache.cassandra.db.index.keys.KeysSearcher and look for the message
> "Primary scan clause is "
>
> Note, if this is a common query you may get better performance creating a
> custom secondary index than using four equality clauses in an index scan.
>
> > 2/ Is there any limitations on the number of criterias we can usually
> have?
>
> None that I know of. Query will probably run slower the more you have.
>
> > 3/ Even if we have different data type (date, string, int), we have all
> stored them as UTF8Type. Could we expect performance improvements if we use
> DateType, LongType?
> No. The main issue is going to be the selectivity of the primary scan
> clause, followed by the number of additional clauses. Their types will have
> very little / no impact.
>
> Hope that helps.
>
> -----------------
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 3/07/2012, at 3:59 AM, Olivier Mallassi wrote:
>
> > Hi all
> >
> > We have 4 indexed columns; all configured in UT8Type even if one columns
> is a date and the other an integer).
> >
> > 1/ the read query we run can have up to 4 criteria
> > select my_cf where columnA = a and columnB = b and columnC = c and
> columnD = d
> >
> > This query, is fast (<500ms) up to 3 criterias but when we add the
> fourth one, the exection time is 9,5s.
> > Do you have any ideas? is there any way to understand how cassandra
> internally run the query (a kind of "explain plan")?
> >
> > 2/ Is there any limitations on the number of criterias we can usually
> have?
> >
> > 3/ Even if we have different data type (date, string, int), we have all
> stored them as UTF8Type. Could we expect performance improvements if we use
> DateType, LongType?
> >
> > Many thx for all your answers.
> >
> > --
> > ............................................................
> > Olivier Mallassi
> > OCTO Technology
> > ............................................................
> > 50, Avenue des Champs-Elysées
> > 75008 Paris
> >
> > Mobile: (33) 6 28 70 26 61
> > Tél: (33) 1 58 56 10 00
> > Fax: (33) 1 58 56 10 01
> >
> > http://www.octo.com
> > Octo Talks! http://blog.octo.com
> >
> >
>
>

-- 
............................................................
Olivier Mallassi
OCTO Technology
............................................................
50, Avenue des Champs-Elysées
75008 Paris

Mobile: (33) 6 28 70 26 61
Tél: (33) 1 58 56 10 00
Fax: (33) 1 58 56 10 01

http://www.octo.com
Octo Talks! http://blog.octo.com

Mime
View raw message