cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aaron morton <aa...@thelastpickle.com>
Subject Re: Indexes on heterogeneous rows
Date Thu, 14 Apr 2011 11:07:50 GMT
You could make your own inverted index by using keys like  "e=5-type=2" where the columns are
either the keys for the object or the objects themselves. Then just grab the full row back.
If you know you always want to run queries like that. 

This recent discussion and blog post from Ed is good background http://www.mail-archive.com/user@cassandra.apache.org/msg12136.html

I'm not sure how efficient the join from "e" to type would be. AFAIK it will iterate all keys
where e=5 and lookup corresponding rows to find out if type = 2. 

If know how you want to read things back and need to deal with lots-o-data I would start testing
with custom indexes. Then compare to the built in ones, it should be reasonably simple add
them for a test.   

Hope that helps. 
Aaron
   
On 14 Apr 2011, at 22:33, David Boxenhorn wrote:

> Thank you for your answer, and sorry about the sloppy terminology.
> 
> I'm thinking of the scenario where there are a small number of results in the result
set, but there are billions of rows in the first of your secondary indexes.
> 
> That is, I want to do something like (not sure of the CQL syntax):
> 
> select * where type=2 and e=5
> 
> where there are billions of rows of type 2, but some manageable number of those rows
have e=5.
> 
> As I understand it, secondary indexes are like column families, where each value is a
column. So the billions of rows where type=2 would go into a single row of the secondary index.
This sounds like a problem to me, is it?  
> 
> I'm assuming that the billions of rows that don't have column "e" at all (those rows
of other types) are not a problem at all...
> 
> On Thu, Apr 14, 2011 at 12:12 PM, aaron morton <aaron@thelastpickle.com> wrote:
> Need to clear up some terminology here. 
> 
> Rows have a key and can be retrieved by key. This is *sort of* the primary index, but
not primary in the normal RDBMS sense. 
> Rows can have different columns and the column names are sorted and can be efficiently
selected.
> There are "secondary indexes" in cassandra 0.7 based on column values http://www.datastax.com/dev/blog/whats-new-cassandra-07-secondary-indexes
> 
> So you could create secondary indexes on the a,e, and h columns and get rows that have
specific values. There are some limitations to secondary indexes, read the linked article.

> 
> Or you can make your own secondary indexes using row keys as the index values.
> 
> If you have billions of rows, how many do you need to read back at once?
> 
> Hope that helps
> Aaron
>     
> On 14 Apr 2011, at 04:23, David Boxenhorn wrote:
> 
>> Is it possible in 0.7.x to have indexes on heterogeneous rows, which have different
sets of columns?
>> 
>> For example, let's say you have three types of objects (1, 2, 3) which each had three
members. If your rows had the following pattern
>> 
>> type=1 a=? b=? c=?
>> type=2 d=? e=? f=?
>> type=3 g=? h=? i=?
>> 
>> could you index "type" as your primary index, and also index "a", "e", "h" as secondary
indexes, to get the objects of that type that you are looking for?
>> 
>> Would it work if you had billions of rows of each type?
> 
> 


Mime
View raw message