On 8/6/10 6:36 PM, Benjamin Black wrote:
Same answer as on other thread right now about how to index:


On Fri, Aug 6, 2010 at 6:18 PM, Mark <static.void.dev@gmail.com> wrote:
On 8/6/10 4:50 PM, Thomas Heller wrote:
Thanks for the suggestion.

I've somewhat understand all that, the point where my head begins to
is when I want to figure out something like

Continuing with your example: "Over the last X amount of days give me all
the logs for remote_addr:XXX".
I'm guessing I would need to create a separate index ColumnFamily???

Depending on your needs you can either insert them directly or pull
them out later in some map/reduce fashion. What you want is another
column Family and a similar structure.

ColumnFamily Standard "LogByRemoteAddrAndDate" CompareWith: TimeUUID

Row: "" Column TimeUUID/JSON as usual. If you want
to "link" to the actual log record (to avoid writing if multiple
times) just insert the same timeuuid you inserted into the other CF
and leave the value empty. So you have your "Index", aka list of
column names, and you can look up the actual values using get_slice
with column_names.

Confusing at first, but really quite simple once you get used to the
idea. Just alot more work then letting SQL do it for you. ;)


Ok, I think the part I was missing was the concatenation of the key and
partition to do the look ups. Is this the preferred way of accomplishing
needs such as this? Are there alternatives ways?

How would one then "query" over multiple days? Same question for all days.
Should I use range_slice or multiget_slice? And if its range_slice does that
mean I need OrderPreservingPartitioner?

This kind of helped me too (taken from the DataModel wiki)

" Since there are no automatically-provided indexes, you will be much closer to one ColumnFamily per query than you would have been with tables:queries relationally"

This is significantly different from what I am used to. In RDBMS one table = many queries whereas in Cassandra one column family roughly equals one query.