cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ondřej Černoš <cern...@gmail.com>
Subject Re: question about secondary index or not
Date Wed, 29 Jan 2014 09:21:59 GMT
Hi,

we had a similar use case. Just do the filtering client-side, the #2
example performs horribly, secondary indexes on something dividing the set
into two roughly the same size subsets just don't work.

Give it a try on localhost with just a couple of records (150.000), you
will see.

regards,

ondrej


On Wed, Jan 29, 2014 at 5:17 AM, Jimmy Lin <y2klyf+work@gmail.com> wrote:

> in my #2 example:
> select * from people where company_id='xxx' and gender='male'
>
> I already specify the first part of the primary key(row key) in my where
> clause, so how does the secondary indexed column gender='male" help
> determine which row to return? It is more like filtering a list of column
> from a row(which is exactly I can do that in #1 example).
> But then if I don't create index first, the cql statement will run into
> syntax error.
>
>
>
>
> On Tue, Jan 28, 2014 at 11:37 AM, Mullen, Robert <
> robert.mullen@pearson.com> wrote:
>
>> I would do #2.   Take a look at this blog which talks about secondary
>> indexes, cardinality, and what it means for cassandra.   Secondary indexes
>> in cassandra are a different beast, so often old rules of thumb about
>> indexes don't apply.   http://www.wentnet.com/blog/?p=77
>>
>>
>> On Tue, Jan 28, 2014 at 10:41 AM, Edward Capriolo <edlinuxguru@gmail.com>wrote:
>>
>>> Generally indexes on binary fields true/false male/female are not
>>> terrible effective.
>>>
>>>
>>> On Tue, Jan 28, 2014 at 12:40 PM, Jimmy Lin <y2klyf+work@gmail.com>wrote:
>>>
>>>> I have a simple column family like the following
>>>>
>>>> create table people(
>>>> company_id text,
>>>> employee_id text,
>>>> gender text,
>>>> primary key(company_id, employee_id)
>>>> );
>>>>
>>>> if I want to find out all the "male" employee given a company id, I can
>>>> do
>>>>
>>>> 1/
>>>> select * from people where company_id=xxxx'
>>>> and loop through the result efficiently to pick the employee who has
>>>> gender column value equal to "male"
>>>>
>>>> 2/
>>>> add a seconday index
>>>> create index gender_index on people(gender)
>>>> select * from people where company_id='xxx' and gender='male'
>>>>
>>>>
>>>> I though #2 seems more appropriate, but I also thought the secondary
>>>> index is helping only locating the primary row key, with the select clause
>>>> in #2, is it more efficient than #1 where application responsible loop
>>>> through the result and filter the right content?
>>>>
>>>> (
>>>> It totally make sense if I only need to find out all the male
>>>> employee(and not within a company) by using
>>>> select * from people where gender='male"
>>>> )
>>>>
>>>> thanks
>>>>
>>>
>>>
>>
>

Mime
View raw message