cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matthias Pfau <p...@l3s.de>
Subject Re: Storing pre-sorted data
Date Wed, 12 Oct 2011 16:56:38 GMT
Unfortunately, that is not an option as we have to store the data in an 
compressed and encrypted and therefore binary and non-sortable form.

On 10/12/2011 06:39 PM, David McNelis wrote:
> Is it an option to not convert the data to binary prior to inserting
> into Cassandra?  Also, how large are the strings you're sorting?  If its
> viable to not convert to binary before writing to Cassandra, and you use
> one of the string based column ordering techniques (utf8, ascii, for
> example), then the data would be sorted without you  needing to
> specifically worry about that.  Of course, if the strings are lengthy
> you could run into  additional issues.
>
> On Wed, Oct 12, 2011 at 11:34 AM, Matthias Pfau <pfau@l3s.de
> <mailto:pfau@l3s.de>> wrote:
>
>     Hi there,
>     we are currently building a prototype based on cassandra and came
>     into problems on implementing sorted lists containing millions of items.
>
>     The special thing about the items of our lists is, that cassandra is
>     not able to sort them as the data is stored in a binary format which
>     is not sortable. However, we are able to sort the data before the
>     plain data gets encoded (our application is responsible for the order).
>
>     First Approach: Storing Lists in ColumnFamilies
>     ***
>     We first tried to map the list to a single row of a ColumnFamily in
>     a way that the index of the list is mapped to the column names and
>     the items of the list to the column values. The column names are
>     increasing numbers which define the sort order.
>     This has the major drawback that big parts of the list have to be
>     rewritten on inserts (because the column names are numbered by their
>     index), which are quite common.
>
>
>     Second Approach: Storing the whole List as Binary Data:
>     ***
>     We tried to store the compressed list in a single column. However,
>     this is only feasible for smaller lists. Our lists are far to big
>     leading to multi megabyte reads and writes. As we need to read and
>     update the lists quite often, this would put our Cassandra cluster
>     under a lot of pressure.
>
>     Ideal Solution: Native support for storing lists
>     ***
>     We would be very happy with a way to store a list of sorted values
>     without making improper use of column names for the list index. This
>     implies that we would need a possibility to insert values at defined
>     positions. We know that this could lead to problems with concurrent
>     inserts in a distributed environment, but this is handled by our
>     application logic.
>
>
>     What are your ideas on that?
>
>     Thanks
>     Matthias
>
>
>
>
> --
> *David McNelis*
> Lead Software Engineer
> Agentis Energy
> www.agentisenergy.com <http://www.agentisenergy.com>
> c: 219.384.5143
>
> /A Smart Grid technology company focused on helping consumers of energy
> control an often under-managed resource./
>
>


Mime
View raw message