incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kaj Magnus Lindberg <kajmagnu...@gmail.com>
Subject Re: Why no need to query all nodes on secondary index lookup?
Date Tue, 06 Sep 2011 09:36:10 GMT
Hi Jonathan

Thanks for the explanation

Thanks, KajMagnus

On Mon, Sep 5, 2011 at 11:05 PM, Jonathan Ellis <jbellis@gmail.com> wrote:
> The first node can answer the question as long as you've requested
> less rows than the first node has on it.  Hence the "low cardinality"
> point in what you quoted.
>
> On Sat, Sep 3, 2011 at 5:00 AM, Kaj Magnus Lindberg
> <kajmagnus79@gmail.com> wrote:
>> Hello Anyone
>>
>> I have a follow up question on a question from February 2011. In
>> short, I wonder why one won't have to query all Cassandra nodes when
>> doing a secondary index lookup -- although each node only indexes data
>> that it holds locally.
>>
>> The question and answer was:
>>  ( http://www.mail-archive.com/user@cassandra.apache.org/msg10506.html  )
>> === Question ===
>> As far as I understand automatic secondary indexes are generated for
>> node local data.
>>   In this case query by secondary index involve all nodes storing part of
>> column family to get results (?) so (if i am right) if data is spread across
>> 50 nodes then 50 nodes are involved in single query?
>> [...]
>> === Answer ===
>> In practice, local secondary indexes scale to {RF * the limit of a single
>> machine} for -low cardinality- values (ex: users living in a certain state)
>> since the first node is likely to be able to answer your question. This also
>> means they are good for performing filtering for analytics.
>> [...]
>>
>> === Now I wonder ===
>> Why would the first node be likely to be able to answer the question?
>> It stores only index entries for users on that particular machine,
>>     (says http://wiki.apache.org/cassandra/SecondaryIndexes:
>>     "Each node only indexes data that it holds locally" )
>> but users might be stored by user name? And would thus be stored on
>> many different machines? Even if they happen to live in the same
>> state?
>>
>> Why won't the client need to query the indexes of [all servers that
>> store info on users] to find all relevant users, when doing a user
>> property lookup?
>>
>>
>> Best regards, KajMagnus
>>
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com
>

Mime
View raw message