cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stephen Connolly <stephen.alan.conno...@gmail.com>
Subject Re: Storing pre-sorted data
Date Wed, 12 Oct 2011 17:42:50 GMT
could you prefix the data with 3-4 bytes of a linear hash of the unencypted
data? it wouldn't be a perfect sort, but you'd have less of a range to query
to get the sorted values?

- Stephen

---
Sent from my Android phone, so random spelling mistakes, random nonsense
words and other nonsense are a direct result of using swype to type on the
screen
On 12 Oct 2011 17:57, "Matthias Pfau" <pfau@l3s.de> wrote:

> Unfortunately, that is not an option as we have to store the data in an
> compressed and encrypted and therefore binary and non-sortable form.
>
> On 10/12/2011 06:39 PM, David McNelis wrote:
>
>> Is it an option to not convert the data to binary prior to inserting
>> into Cassandra?  Also, how large are the strings you're sorting?  If its
>> viable to not convert to binary before writing to Cassandra, and you use
>> one of the string based column ordering techniques (utf8, ascii, for
>> example), then the data would be sorted without you  needing to
>> specifically worry about that.  Of course, if the strings are lengthy
>> you could run into  additional issues.
>>
>> On Wed, Oct 12, 2011 at 11:34 AM, Matthias Pfau <pfau@l3s.de
>> <mailto:pfau@l3s.de>> wrote:
>>
>>    Hi there,
>>    we are currently building a prototype based on cassandra and came
>>    into problems on implementing sorted lists containing millions of
>> items.
>>
>>    The special thing about the items of our lists is, that cassandra is
>>    not able to sort them as the data is stored in a binary format which
>>    is not sortable. However, we are able to sort the data before the
>>    plain data gets encoded (our application is responsible for the order).
>>
>>    First Approach: Storing Lists in ColumnFamilies
>>    ***
>>    We first tried to map the list to a single row of a ColumnFamily in
>>    a way that the index of the list is mapped to the column names and
>>    the items of the list to the column values. The column names are
>>    increasing numbers which define the sort order.
>>    This has the major drawback that big parts of the list have to be
>>    rewritten on inserts (because the column names are numbered by their
>>    index), which are quite common.
>>
>>
>>    Second Approach: Storing the whole List as Binary Data:
>>    ***
>>    We tried to store the compressed list in a single column. However,
>>    this is only feasible for smaller lists. Our lists are far to big
>>    leading to multi megabyte reads and writes. As we need to read and
>>    update the lists quite often, this would put our Cassandra cluster
>>    under a lot of pressure.
>>
>>    Ideal Solution: Native support for storing lists
>>    ***
>>    We would be very happy with a way to store a list of sorted values
>>    without making improper use of column names for the list index. This
>>    implies that we would need a possibility to insert values at defined
>>    positions. We know that this could lead to problems with concurrent
>>    inserts in a distributed environment, but this is handled by our
>>    application logic.
>>
>>
>>    What are your ideas on that?
>>
>>    Thanks
>>    Matthias
>>
>>
>>
>>
>> --
>> *David McNelis*
>> Lead Software Engineer
>> Agentis Energy
>> www.agentisenergy.com <http://www.agentisenergy.com>
>> c: 219.384.5143
>>
>> /A Smart Grid technology company focused on helping consumers of energy
>> control an often under-managed resource./
>>
>>
>>
>

Mime
View raw message