Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id C2C197A8B for ; Tue, 18 Oct 2011 07:15:50 +0000 (UTC) Received: (qmail 5909 invoked by uid 500); 18 Oct 2011 07:15:48 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 5888 invoked by uid 500); 18 Oct 2011 07:15:48 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 5880 invoked by uid 99); 18 Oct 2011 07:15:47 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 18 Oct 2011 07:15:47 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [130.75.2.106] (HELO mrelay1.uni-hannover.de) (130.75.2.106) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 18 Oct 2011 07:15:40 +0000 Received: from server1.l3s.uni-hannover.de (server1.l3s.uni-hannover.de [130.75.87.1]) by mrelay1.uni-hannover.de (8.14.4/8.14.4) with ESMTP id p9I7FAhU001935 for ; Tue, 18 Oct 2011 09:15:13 +0200 Received: by server1.l3s.uni-hannover.de (Postfix, from userid 21011) id 008B43240339; Tue, 18 Oct 2011 09:15:09 +0200 (CEST) X-Spam-Level: X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on server1.l3s.uni-hannover.de Received: from [130.75.87.169] (pc169.l3s.uni-hannover.de [130.75.87.169]) (Authenticated sender: pfau@server1.l3s.uni-hannover.de) by server1.l3s.uni-hannover.de (Postfix) with ESMTP id B4AED32402A9 for ; Tue, 18 Oct 2011 09:14:58 +0200 (CEST) Message-ID: <4E9D2772.5060005@l3s.de> Date: Tue, 18 Oct 2011 09:14:58 +0200 From: Matthias Pfau User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.23) Gecko/20110922 Thunderbird/3.1.15 MIME-Version: 1.0 To: user@cassandra.apache.org Subject: Re: Storing pre-sorted data References: <4E95C17C.5000008@l3s.de> <4E95C6C6.2060304@l3s.de> <4E969786.7040208@l3s.de> <4E97186A.5010401@l3s.de> <4E974834.2030205@l3s.de> <4E9BF7C0.9090107@l3s.de> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-PMX-Version: 5.5.9.395186, Antispam-Engine: 2.7.2.376379, Antispam-Data: 2011.10.18.70321 X-Old-Spam-Status: No, score=-44.2 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00 autolearn=ham version=3.3.1 Aaron, we want to sort completely on the client-side (where the data is encrypted). But that requires an "insert at offset X" operation. We would always use CL QUORUM and client side synchronisation. However, it seems to be not be a good idea to add such a feature to cassandra as everyone using these features would have to use client synchronization and QUORUM. Kind regards Matthias On 10/17/2011 10:38 PM, aaron morton wrote: > Sort order is determined by the Comparator, which is an implementation of the o.a.c.db.marshal.AbstractType class. > > If you wish to order column (names) in a row based on an opaque (to cassandra) byte value you can create your own implementation. You would then need to decrypt and compare column values. I've not idea how feasible that is in your situation with regard to security and performance. > > Also without understanding how you want to compare the values, composite types may be useful as you can make some parts of the opaque value visible to cassandra so sorting. If you can provide more info on how you want the sorting (i may have missed it) done that would be handy. > > Insert at offset X, even with client side synchronization, would be troublesome at CL ONE. > > Hope that helps. > > ----------------- > Aaron Morton > Freelance Cassandra Developer > @aaronmorton > http://www.thelastpickle.com > > On 17/10/2011, at 10:39 PM, Matthias Pfau wrote: > >> David, >> thanks for your nice summary on this topic. >> >> We would be very happy if cassandra would give us an option to maintain the sort order on our own (application logic). That is why it would be interesting to hear from any of the developers if it would be easily possible to add such a feature to cassandra. >> >> Otherwise, it seems like we have to implement sth. based on strategy (a) because (b) is not feasible for us and (c) is a rather young research topic which is slowly gaining more attention. >> >> Kind Regards >> Matthias >> >> On 10/15/2011 10:55 PM, David Jeske wrote: >>> Logically, whether you use cassandra or not, there is some "physics" of >>> sorted order structures which you should understand and dictate what is >>> possible. >>> >>> In order to keep data sorted, a database needs to be able to see the >>> proper sort-order of the data "all the time" not just at insertion or >>> query time. When inserting a new record, it is compared with existing >>> records to put it in the "right place". >>> >>> As a result, whether you use cassandra or a different system, I believe >>> you are limited to one of these strategies: >>> >>> (a) Encrypt the data outside the database with non-order-preserving >>> encryption, and expose some "actual data" in unencrypted form for >>> sorting. Since the encryption ruins the sort order, some "actual data" >>> must be exposed to sort properly. Any data you expose, even if encoded, >>> would be your actual data, because otherwise it wouldn't sort in the >>> right order. You can limit the amount of data you expose, creating >>> buckets instead of proper detailed sorting. Within buckets, only the >>> agent capable of decrypting the data would be able to properly order the >>> data within a bucket. >>> >>> (b) Encrypt the data inside the database. This would expose the "actual >>> data" to the database, allowing it to keep it in proper order. The code >>> to handle encryption would be handled after sort-order comparisons. The >>> code (and keys) for decryption would also be known to the database. The >>> data would need to be decryptable by the database at all times, because >>> the database will need to compare new data to existing data in order to >>> perform operations correctly. >>> >>> (c) Use an order-preserving encryption scheme. If the encryption output >>> is in the same order as the source-data, then the database can sort on >>> the encrypted data and get proper sort-results. I don't know anything >>> about this field, but doing a google search returned the following paper... >>> >>> http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.133.8664&rep=rep1&type=pdf >>> >>> >>> I believe these three cases represent a totalogy of what is possible in >>> any data-storage system. So the solution you compose would involve one >>> or more of these schemes. >>> >>> One might be tempted to generate some type of "ordinal value" >>> representing the sort-order of an item. However, in order for this >>> ordinal to be mathematically unrelated to the original data, it would >>> have to be generated by a system which stored a copy of the entire data, >>> which would then have to use one of the above three methods. (i.e. this >>> approach is a chicken and egg problem) >> >