cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Jeske <dav...@gmail.com>
Subject Re: Storing pre-sorted data
Date Tue, 18 Oct 2011 06:53:17 GMT
On Mon, Oct 17, 2011 at 2:39 AM, Matthias Pfau <pfau@l3s.de> wrote:

> We would be very happy if cassandra would give us an option to maintain the
> sort order on our own (application logic). That is why it would be
> interesting to hear from any of the developers if it would be easily
> possible to add such a feature to cassandra.


What you are describing above is option (b), you would do this by building
your sort-order, encryption, and decryption into Cassandra. Let me
elaborate...

The database always has to know how to compute sort order for items.
Deferring it to your code can only happen two ways, in-process, or
out-of-process. Deferring sort-order comparisons to out-of-process code
would have diasterous effects on performance, as they are used multiple
times for every single operation the database does. Therefore, short of an
application where performance is irrelevant, the feasable method to allow
your code to maintain sort-order is "option b", to build your
sort-order/encryption/decryption into the database. Cassandra would have to
initialize it at startup to read your database.

Cassandra is open-source, so you can do this work on your own right now.
Aaron's message provided some pointers.

If you do go this route, you'll probably want to separate your
sort-order-and-encryption-handler into a separate JAR, and add some code to
Cassandra to load-and-register your classes when the database starts. You'd
submit this "stable data-format plug-in-API" patch to Cassandra, and
hopefully find a way to get it accepted into the main codebase. This would
make it easier for you to update to new versions, as you would only be
dependent only on the public-API, rather than a private fork of Cassandra.


> Otherwise, it seems like we have to implement sth. based on strategy (a)
> because (b) is not feasible for us and (c) is a rather young research topic
> which is slowly gaining more attention.
>

Certainly (option a) is the most straightforward method if you wish to keep
your codebase completely separate from your database (whether Cassandra or
not). Whether this is an acceptable security risk or not is up to you.

--------

Pulling back from implementation issues, I wonder if you might share a bit
more about the reason you need this functionality for your application. Here
are a few questions I'm curious about:

1) Is the data all-encrypted with a single key, or do different records use
different keys?
2) If a single key, would adding a file/block/record-level encryption to
Cassandra solve this problem? If not, why not? Is there something special
about your encryption methods?
3) Is the compression of the data somehow special, such that block-level
compression (either zlib, snappy, or even a custom-implemented scheme) is
not viable? If so, why?
4) Is there something special about the sorting that makes it hard to expose
the sort order to a database? (other than cassandra's lack of general
composite key sorting)

Mime
View raw message