So I've been thinking about the problem of how to do range queries on keys with random partitioning. I'm new to Cassandra, and I don't know what the plans are, but I have an idea and I thought I'd just put it out there: Predicate Indexes.
I would like to be able to define predicate indexes in Cassandra, something like this:
<ColumnFamily Name="Super1" ColumnType="Super" CompareWith="BytesType" CompareSubcolumnsWith="BytesType">
<Index Name="Cat1" Type="Range" Start="CATEGORY1." Finish="CATEGORY1/" />
<Index Name="Cat2" Type="Regex" Regex="^CATEGORY2." />
At each node, Cassandra would maintain indexes for every key that matches the predicate that each index defines. Within each index, keys would be ordered by the order implied by Random Partitioner.
A new attribute should be added to KeyRange: Name - i.e. setName(String name), getName(), etc.
When we loop through the keys, we would pass the last key in as the start key, until we finish, as we do now. The results would not be ordered, but we would have very quick access to the entire range implied by the predicate.
I very much want something like this. I am willing to pay the price in disk space.
Yes, I know that something like this can be approximated by super columns. But supercolumns have well-known problems, primarily practical limitations on the size of supercolumns, secondarily the increased number of round-trips that working with supercolumns necessitates, and tertiarily the management costs of maintaining the supercolumns by hand.