From Stu Hood <>
Subject Re: How do secondary indices work
Date Wed, 09 Feb 2011 11:09:44 GMT

The secondary indexes in 0.7.0 (type KEYS) are stored internally in a column
family, and are kept synchronized with the base data via locking on a local
node, meaning they are always consistent on the local node. Eventual
consistency still applies between nodes, but a returned result will always
match your query.

This index column family stores a mapping from index values to a sorted list
of matching row keys. When you query for rows between x and y matching a
value z (via the get_indexed_slices call), Cassandra performs a lookup to
the index column family for the slice of columns in row z between x and y.
If any matches are found in the index, they are row keys that match the
index clause, and we query the base data to return you those rows.

Iterating through all of the rows matching an index clause on your cluster
is guaranteed to touch N/RF of the nodes in your cluster, because each node
only knows about data that is indexed locally.

Some portions of the indexing implementation are not fully baked yet: for
instance, although the API allows you to specify multiple columns, only one
index will actually be used per query, and the rest of the clauses will be
brute forced.

A second secondary index implementation has been on the back burner for a
while: it provides an identical API, but does not use a column family to
store the index, and should be more efficient for append only data. See


On Wed, Feb 9, 2011 at 2:35 AM, <> wrote:

> Thank you for the links, I did read a bit in the comments of the ticket,
> but I couldn't get much out of it.
> I am mainly interested in how the index is stored and partitioned, not how
> it is used. I think the people in the dev list will probably be better
> qualified to answer that. My questions always seem to get moved to the
> user list, and usually with good cause, but I think this time it should be
> in the dev list :) Please move it back, if you can.
> Alexander
> > AFAIK this was the ticket the original work was done under
> >
> >
> > also
> > and  http://pycassa.githubcom/pycassa/tutorial.html#indexes may help
> >
> > (sorry on reflection the email prob did not need to be moved from dev, my
> > bad)
> > Aaron
> >
> > On 09 Feb, 2011,at 09:16 AM, Aaron Morton <>
> wrote:
> >
> > Moving to the user group.
> >
> >
> >
> > On 08 Feb, 2011,at 11:39 PM, wrote:
> >
> > Hello,
> >
> > I'd like some information about how secondary indices work under the
> hood.
> >
> > 1) Is data stored in some external data structure, or is it stored in an
> > actual Cassandra table, as columns within column families?
> > 2) Is data stored sorted or not? How is it partitioned?
> > 3) How can I access index data?
> >
> > Thanks in a advance,
> >
> > Alexander Altanis
> >

