cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Patrick McFadin <pmcfa...@gmail.com>
Subject Re: timeout when using secondary index
Date Wed, 11 Mar 2015 03:39:27 GMT
Jimmy,

The secondary index is getting scanned since you put the column in your
query. The behavior you are looking for is a coming feature called Global
Indexes slated for 3.0. https://issues.apache.org/jira/browse/CASSANDRA-6477

In the meantime, you could build your own lookup table even with this low
of cardinality. If the point is to find everyone of a certain gender in a
company, give this a try.

create table company_gender (
   company_id uuid,
   gender text,
   person_id uuid,
   PRIMARY KEY (company_id, gender)
)

Each company would be a partition and you could find all males or females
with a single query. The bonus is that you would get paging which will be
much more efficient.

Patrick




On Fri, Mar 6, 2015 at 2:56 PM, Jimmy Lin <y2klyf+work@gmail.com> wrote:

> Hi,
> Ran into RPC timeout exception when execution a query that involve
> secondary index of a Boolean column when for example the company has more
> than 1k person.
>
> select * from company where company_id=xxxx and isMale = true;
>
> such extreme low cardinality of secondary index  like the other docs
> stated, will result in basically 2 large row those values. However, I
> thought since I also bounded the query with my primary partition key, won't
> that be first consulted and then further narrow down the result and be
> efficient?
>
> Also, if I simply do
> select * from company where company_id=xxxx ;
> (without the AND clause on secondary index, it return right away)
>
>
> Or mayb Cassandra server internal always parsing the secondary index
> result first?
>
> thanks
>
>
>
> I have a simple table
>
> create table company {
> company_id uuid,
> person_id uuid,
> isMale Boolean,
> PRIMARY KEY (company_id, person_id)
> )
>
>
>
>
>

Mime
View raw message