incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From alta...@ceid.upatras.gr
Subject Re: How do secondary indices work
Date Wed, 09 Feb 2011 13:08:09 GMT
Thank you very much, this is the information I was looking for. I started
adding secondary index functionality to Cassandra myself, and it turns out
I am doing almost exactly the same thing. I will try to change my code to
use your implementation as well to compare results.

Alexander

> Alexander:
>
> The secondary indexes in 0.7.0 (type KEYS) are stored internally in a
> column
> family, and are kept synchronized with the base data via locking on a
> local
> node, meaning they are always consistent on the local node. Eventual
> consistency still applies between nodes, but a returned result will always
> match your query.
>
> This index column family stores a mapping from index values to a sorted
> list
> of matching row keys. When you query for rows between x and y matching a
> value z (via the get_indexed_slices call), Cassandra performs a lookup to
> the index column family for the slice of columns in row z between x and y.
> If any matches are found in the index, they are row keys that match the
> index clause, and we query the base data to return you those rows.
>
> Iterating through all of the rows matching an index clause on your cluster
> is guaranteed to touch N/RF of the nodes in your cluster, because each
> node
> only knows about data that is indexed locally.
>
> Some portions of the indexing implementation are not fully baked yet: for
> instance, although the API allows you to specify multiple columns, only
> one
> index will actually be used per query, and the rest of the clauses will be
> brute forced.
>
> A second secondary index implementation has been on the back burner for a
> while: it provides an identical API, but does not use a column family to
> store the index, and should be more efficient for append only data. See
> https://issues.apache.org/jira/browse/CASSANDRA-1472
>
> Thanks,
> Stu
>
> On Wed, Feb 9, 2011 at 2:35 AM, <altanis@ceid.upatras.gr> wrote:
>
>> Thank you for the links, I did read a bit in the comments of the ticket,
>> but I couldn't get much out of it.
>>
>> I am mainly interested in how the index is stored and partitioned, not
>> how
>> it is used. I think the people in the dev list will probably be better
>> qualified to answer that. My questions always seem to get moved to the
>> user list, and usually with good cause, but I think this time it should
>> be
>> in the dev list :) Please move it back, if you can.
>>
>> Alexander
>>
>> > AFAIK this was the ticket the original work was done under
>> > https://issues.apache.org/jira/browse/CASSANDRA-1415
>> >
>> > also  http://www.datastax.com/docs/0.7/data_model/secondary_indexes
>> > and  http://pycassa.githubcom/pycassa/tutorial.html#indexes may help
>> >
>> > (sorry on reflection the email prob did not need to be moved from dev,
>> my
>> > bad)
>> > Aaron
>> >
>> > On 09 Feb, 2011,at 09:16 AM, Aaron Morton <aaron@thelastpickle.com>
>> wrote:
>> >
>> > Moving to the user group.
>> >
>> >
>> >
>> > On 08 Feb, 2011,at 11:39 PM, altanis@ceid.upatras.gr wrote:
>> >
>> > Hello,
>> >
>> > I'd like some information about how secondary indices work under the
>> hood.
>> >
>> > 1) Is data stored in some external data structure, or is it stored in
>> an
>> > actual Cassandra table, as columns within column families?
>> > 2) Is data stored sorted or not? How is it partitioned?
>> > 3) How can I access index data?
>> >
>> > Thanks in a advance,
>> >
>> > Alexander Altanis
>> >
>>
>


Mime
View raw message