incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Courtney Robinson <>
Subject indexing methods
Date Fri, 03 Sep 2010 10:26:26 GMT
A few of us working on a book for casanadra and got to the point where we (well I did anyway)
 wanted to include an example of a non trivial inverted index. 

I've been playing around  with different ideas on how I could store the data and I've had
a look at the previous threads that touched on the subject but with the 2 or 3 ideas I've
seen on the list someone always points out something in the approach that punches a hole in

I've been playing around with the idea of using a Columnfamily for the index where I store
the terms as the key then each column name is a 64 bit long and its value is the doc id. If
the column name represents a ranking for the doc id it stores and the compare with option
is LongType then once a term is retrieved the first x amount of columns would represent the
most related docs for that term. 

I'd go on in more detail but I'm using my phone to write this and I think that gets the idea
Ofcourse my first thought to this is, is it scalable? In a system where possibly millions
of docs are related to one term, is that a good idea to have potentially that many columns
in one row all associated to the one row key which is the term?

I just want to know what others think, if you have any suggestions or have a similar thing
implemented and you're able to share.

On a side note to that, there has been a bit of talk about secondary indexes in 0.7 can anyone
shed some light on that, or point me to any presentation or the like where its mentioned so
I can get a better idea of what its for.

View raw message