cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wangpei (Peter)" <>
Subject Re: Indexes on heterogeneous rows
Date Fri, 15 Apr 2011 09:25:36 GMT
Does the get_indexed_slice in 0.7.4 version already do thing that way?
It seems always take the 1st indexed column with EQ.
Or is it a new feature of coming 0.7.5 or 0.8?

发件人: Jonathan Ellis [] 
发送时间: 2011年4月15日 0:21
抄送: David Boxenhorn; aaron morton
主题: Re: Indexes on heterogeneous rows

This should work reasonably well w/ 0.7 indexes. Cassandra tracks
statistics on index selectivity, so it would plan that query as "index
lookup on e=5, then iterate over those results and return only rows
that also have type=2."

On Thu, Apr 14, 2011 at 5:33 AM, David Boxenhorn <> wrote:
> Thank you for your answer, and sorry about the sloppy terminology.
> I'm thinking of the scenario where there are a small number of results in
> the result set, but there are billions of rows in the first of your
> secondary indexes.
> That is, I want to do something like (not sure of the CQL syntax):
> select * where type=2 and e=5
> where there are billions of rows of type 2, but some manageable number of
> those rows have e=5.
> As I understand it, secondary indexes are like column families, where each
> value is a column. So the billions of rows where type=2 would go into a
> single row of the secondary index. This sounds like a problem to me, is it?
> I'm assuming that the billions of rows that don't have column "e" at all
> (those rows of other types) are not a problem at all...
> On Thu, Apr 14, 2011 at 12:12 PM, aaron morton <>
> wrote:
>> Need to clear up some terminology here.
>> Rows have a key and can be retrieved by key. This is *sort of* the primary
>> index, but not primary in the normal RDBMS sense.
>> Rows can have different columns and the column names are sorted and can be
>> efficiently selected.
>> There are "secondary indexes" in cassandra 0.7 based on column
>> values
>> So you could create secondary indexes on the a,e, and h columns and get
>> rows that have specific values. There are some limitations to secondary
>> indexes, read the linked article.
>> Or you can make your own secondary indexes using row keys as the index
>> values.
>> If you have billions of rows, how many do you need to read back at once?
>> Hope that helps
>> Aaron
>> On 14 Apr 2011, at 04:23, David Boxenhorn wrote:
>> Is it possible in 0.7.x to have indexes on heterogeneous rows, which have
>> different sets of columns?
>> For example, let's say you have three types of objects (1, 2, 3) which
>> each had three members. If your rows had the following pattern
>> type=1 a=? b=? c=?
>> type=2 d=? e=? f=?
>> type=3 g=? h=? i=?
>> could you index "type" as your primary index, and also index "a", "e", "h"
>> as secondary indexes, to get the objects of that type that you are looking
>> for?
>> Would it work if you had billions of rows of each type?

Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
View raw message