cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matthias Pfau <p...@l3s.de>
Subject Re: Storing pre-sorted data
Date Thu, 13 Oct 2011 07:47:18 GMT
Hi Stephen,
this is a great idea but unfortunately doesn't work for us either as we 
can not store the data in an unencrypted form.

Kind regards
Matthias

On 10/12/2011 07:42 PM, Stephen Connolly wrote:
> could you prefix the data with 3-4 bytes of a linear hash of the
> unencypted data? it wouldn't be a perfect sort, but you'd have less of a
> range to query to get the sorted values?
>
> - Stephen
>
> ---
> Sent from my Android phone, so random spelling mistakes, random nonsense
> words and other nonsense are a direct result of using swype to type on
> the screen
>
> On 12 Oct 2011 17:57, "Matthias Pfau" <pfau@l3s.de <mailto:pfau@l3s.de>>
> wrote:
>
>     Unfortunately, that is not an option as we have to store the data in
>     an compressed and encrypted and therefore binary and non-sortable form.
>
>     On 10/12/2011 06:39 PM, David McNelis wrote:
>
>         Is it an option to not convert the data to binary prior to inserting
>         into Cassandra?  Also, how large are the strings you're sorting?
>           If its
>         viable to not convert to binary before writing to Cassandra, and
>         you use
>         one of the string based column ordering techniques (utf8, ascii, for
>         example), then the data would be sorted without you  needing to
>         specifically worry about that.  Of course, if the strings are
>         lengthy
>         you could run into  additional issues.
>
>         On Wed, Oct 12, 2011 at 11:34 AM, Matthias Pfau <pfau@l3s.de
>         <mailto:pfau@l3s.de>
>         <mailto:pfau@l3s.de <mailto:pfau@l3s.de>>> wrote:
>
>             Hi there,
>             we are currently building a prototype based on cassandra and
>         came
>             into problems on implementing sorted lists containing
>         millions of items.
>
>             The special thing about the items of our lists is, that
>         cassandra is
>             not able to sort them as the data is stored in a binary
>         format which
>             is not sortable. However, we are able to sort the data
>         before the
>             plain data gets encoded (our application is responsible for
>         the order).
>
>             First Approach: Storing Lists in ColumnFamilies
>             ***
>             We first tried to map the list to a single row of a
>         ColumnFamily in
>             a way that the index of the list is mapped to the column
>         names and
>             the items of the list to the column values. The column names are
>             increasing numbers which define the sort order.
>             This has the major drawback that big parts of the list have
>         to be
>             rewritten on inserts (because the column names are numbered
>         by their
>             index), which are quite common.
>
>
>             Second Approach: Storing the whole List as Binary Data:
>             ***
>             We tried to store the compressed list in a single column.
>         However,
>             this is only feasible for smaller lists. Our lists are far
>         to big
>             leading to multi megabyte reads and writes. As we need to
>         read and
>             update the lists quite often, this would put our Cassandra
>         cluster
>             under a lot of pressure.
>
>             Ideal Solution: Native support for storing lists
>             ***
>             We would be very happy with a way to store a list of sorted
>         values
>             without making improper use of column names for the list
>         index. This
>             implies that we would need a possibility to insert values at
>         defined
>             positions. We know that this could lead to problems with
>         concurrent
>             inserts in a distributed environment, but this is handled by our
>             application logic.
>
>
>             What are your ideas on that?
>
>             Thanks
>             Matthias
>
>
>
>
>         --
>         *David McNelis*
>         Lead Software Engineer
>         Agentis Energy
>         www.agentisenergy.com <http://www.agentisenergy.com>
>         <http://www.agentisenergy.com>
>         c: 219.384.5143 <tel:219.384.5143>
>
>         /A Smart Grid technology company focused on helping consumers of
>         energy
>         control an often under-managed resource./
>
>
>


Mime
View raw message