I'm not familiar with some of the details, but I'll try to answer your questions in general.  Secondary indexes are implemented as a slightly special separate column family with the indexed value serving as the key; most of the properties of secondary indexes follow from that.

On Sun, Apr 3, 2011 at 2:28 PM, Drew Kutcharian <drew@venarc.com> wrote:
Hi Everyone,

I posted the following email a couple of days ago and I didn't get any responses. Makes me wonder, does anyone on this list know/use Secondary Indexes? They seem to me like a pretty big feature and it's a bit disappointing to not be able to get a documentation on it.

The only thing I could find on the Wiki was the end of http://wiki.apache.org/cassandra/StorageConfiguration and that was pointing to the non-existing page http://wiki.apache.org/cassandra/SecondaryIndexes . In addition, I checked the JIRA CASSANDRA-749 and there's a lot of back and forth that I couldn't really figure out what the conclusion was. What gives?

I think the Cassandra committers are doing a heck of a job adding all these cool functionalities but the documenting side doesn't really keep up. Jonathan Ellis's blog post on Secondary Indexes only scratches the surface of the topic, and if you consider that the whole point of using Cassandra is scalability, there isn't a single mention of how Secondary Indexes scale!!! (This same thing applies to Counters too)

I'm not trying to be a complainer, but as someone new to this community, I hope you guys take my comments as productive criticism.




I just read Jonathan Ellis' great post on Secondary Indexes (http://www.datastax.com/dev/blog/whats-new-cassandra-07-secondary-indexes) and I was wondering where I can find a bit more info on them. I would like to know:

1) Are there in limitations beside the hash properties (no between queries)? Like size or memory, etc?


2) Are there distributed? If so, how does that work? How are there stored on the nodes?

Each node only indexes data that it holds locally.

3) When you write a new row, when/how does the index get updated? What I would like to know is the atomicity of the operation, is the "index write" part of the "row write"?

The row and index updates are one atomic operation.

4) Is there a difference between creating a secondary index vs creating an "index" CF manually such as "users_by_country"? 

Yes.  First, when creating your own index, a node may index data held by another node.  Second, updates to the index and data are not atomic.

Your feedback is certainly helpful and hopefully we can get some of these details into the documentation!

Tyler Hobbs
Software Engineer, DataStax
Maintainer of the pycassa Cassandra Python client library