cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mullen, Robert" <robert.mul...@pearson.com>
Subject Re: question about secondary index or not
Date Wed, 29 Jan 2014 15:51:02 GMT
Thanks for that info ondrej, I've never tested out secondary indexes as
I've avoided them because of all the uncertainty around them, and your
statement just adds to the uncertainty.  Everything I had read said that
secondary indexes were supposed to work well for columns with low
cardinality, but I guess that's not always the case.

peace,
Rob


On Wed, Jan 29, 2014 at 2:21 AM, Ondřej Černoš <cernoso@gmail.com> wrote:

> Hi,
>
> we had a similar use case. Just do the filtering client-side, the #2
> example performs horribly, secondary indexes on something dividing the set
> into two roughly the same size subsets just don't work.
>
> Give it a try on localhost with just a couple of records (150.000), you
> will see.
>
> regards,
>
> ondrej
>
>
> On Wed, Jan 29, 2014 at 5:17 AM, Jimmy Lin <y2klyf+work@gmail.com> wrote:
>
>> in my #2 example:
>> select * from people where company_id='xxx' and gender='male'
>>
>> I already specify the first part of the primary key(row key) in my where
>> clause, so how does the secondary indexed column gender='male" help
>> determine which row to return? It is more like filtering a list of column
>> from a row(which is exactly I can do that in #1 example).
>> But then if I don't create index first, the cql statement will run into
>> syntax error.
>>
>>
>>
>>
>> On Tue, Jan 28, 2014 at 11:37 AM, Mullen, Robert <
>> robert.mullen@pearson.com> wrote:
>>
>>> I would do #2.   Take a look at this blog which talks about secondary
>>> indexes, cardinality, and what it means for cassandra.   Secondary indexes
>>> in cassandra are a different beast, so often old rules of thumb about
>>> indexes don't apply.   http://www.wentnet.com/blog/?p=77
>>>
>>>
>>> On Tue, Jan 28, 2014 at 10:41 AM, Edward Capriolo <edlinuxguru@gmail.com
>>> > wrote:
>>>
>>>> Generally indexes on binary fields true/false male/female are not
>>>> terrible effective.
>>>>
>>>>
>>>> On Tue, Jan 28, 2014 at 12:40 PM, Jimmy Lin <y2klyf+work@gmail.com>wrote:
>>>>
>>>>> I have a simple column family like the following
>>>>>
>>>>> create table people(
>>>>> company_id text,
>>>>> employee_id text,
>>>>> gender text,
>>>>> primary key(company_id, employee_id)
>>>>> );
>>>>>
>>>>> if I want to find out all the "male" employee given a company id, I
>>>>> can do
>>>>>
>>>>> 1/
>>>>> select * from people where company_id=xxxx'
>>>>> and loop through the result efficiently to pick the employee who has
>>>>> gender column value equal to "male"
>>>>>
>>>>> 2/
>>>>> add a seconday index
>>>>> create index gender_index on people(gender)
>>>>> select * from people where company_id='xxx' and gender='male'
>>>>>
>>>>>
>>>>> I though #2 seems more appropriate, but I also thought the secondary
>>>>> index is helping only locating the primary row key, with the select clause
>>>>> in #2, is it more efficient than #1 where application responsible loop
>>>>> through the result and filter the right content?
>>>>>
>>>>> (
>>>>> It totally make sense if I only need to find out all the male
>>>>> employee(and not within a company) by using
>>>>> select * from people where gender='male"
>>>>> )
>>>>>
>>>>> thanks
>>>>>
>>>>
>>>>
>>>
>>
>

Mime
View raw message