From Matthias Pfau <>
Subject Storing pre-sorted data
Date Wed, 12 Oct 2011 16:34:04 GMT
Hi there,
we are currently building a prototype based on cassandra and came into 
problems on implementing sorted lists containing millions of items.

The special thing about the items of our lists is, that cassandra is not 
able to sort them as the data is stored in a binary format which is not 
sortable. However, we are able to sort the data before the plain data 
gets encoded (our application is responsible for the order).

First Approach: Storing Lists in ColumnFamilies
We first tried to map the list to a single row of a ColumnFamily in a 
way that the index of the list is mapped to the column names and the 
items of the list to the column values. The column names are increasing 
numbers which define the sort order.
This has the major drawback that big parts of the list have to be 
rewritten on inserts (because the column names are numbered by their 
index), which are quite common.

Second Approach: Storing the whole List as Binary Data:
We tried to store the compressed list in a single column. However, this 
is only feasible for smaller lists. Our lists are far to big leading to 
multi megabyte reads and writes. As we need to read and update the lists 
quite often, this would put our Cassandra cluster under a lot of pressure.

Ideal Solution: Native support for storing lists
We would be very happy with a way to store a list of sorted values 
without making improper use of column names for the list index. This 
implies that we would need a possibility to insert values at defined 
positions. We know that this could lead to problems with concurrent 
inserts in a distributed environment, but this is handled by our 
application logic.

What are your ideas on that?


