incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kaj Magnus Lindberg <kajmagnu...@gmail.com>
Subject Why no need to query all nodes on secondary index lookup?
Date Sat, 03 Sep 2011 10:00:28 GMT
Hello Anyone

I have a follow up question on a question from February 2011. In
short, I wonder why one won't have to query all Cassandra nodes when
doing a secondary index lookup -- although each node only indexes data
that it holds locally.

The question and answer was:
  ( http://www.mail-archive.com/user@cassandra.apache.org/msg10506.html  )
=== Question ===
As far as I understand automatic secondary indexes are generated for
node local data.
   In this case query by secondary index involve all nodes storing part of
column family to get results (?) so (if i am right) if data is spread across
50 nodes then 50 nodes are involved in single query?
[...]
=== Answer ===
In practice, local secondary indexes scale to {RF * the limit of a single
machine} for -low cardinality- values (ex: users living in a certain state)
since the first node is likely to be able to answer your question. This also
means they are good for performing filtering for analytics.
[...]

=== Now I wonder ===
Why would the first node be likely to be able to answer the question?
It stores only index entries for users on that particular machine,
    (says http://wiki.apache.org/cassandra/SecondaryIndexes:
    "Each node only indexes data that it holds locally" )
but users might be stored by user name? And would thus be stored on
many different machines? Even if they happen to live in the same
state?

Why won't the client need to query the indexes of [all servers that
store info on users] to find all relevant users, when doing a user
property lookup?


Best regards, KajMagnus

Mime
View raw message