Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id C80D79B5E for ; Mon, 17 Oct 2011 20:38:55 +0000 (UTC) Received: (qmail 31662 invoked by uid 500); 17 Oct 2011 20:38:53 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 31637 invoked by uid 500); 17 Oct 2011 20:38:53 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 31629 invoked by uid 99); 17 Oct 2011 20:38:53 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 17 Oct 2011 20:38:53 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [208.113.200.5] (HELO homiemail-a55.g.dreamhost.com) (208.113.200.5) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 17 Oct 2011 20:38:46 +0000 Received: from homiemail-a55.g.dreamhost.com (localhost [127.0.0.1]) by homiemail-a55.g.dreamhost.com (Postfix) with ESMTP id 6A5D812C05F for ; Mon, 17 Oct 2011 13:38:20 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=thelastpickle.com; h=content-type :mime-version:subject:from:in-reply-to:date :content-transfer-encoding:message-id:references:to; q=dns; s= thelastpickle.com; b=HV45fURVRvXuaKbaOgmttUWD9N4FRkrfD18ARq93v0R gxjqkdFIrQo8X1sEAc+gJ1Nayle0zSA5hTQsDNkUKB9x7ubsVYtvfRyuOogt8c44 D8W2RE+GfiZbe8A9uExQku1FGg+m1VrWftbc5qHssLdJrfKp7E1P6o6BZTTdv6dU = DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=thelastpickle.com; h= content-type:mime-version:subject:from:in-reply-to:date :content-transfer-encoding:message-id:references:to; s= thelastpickle.com; bh=gO3wyt0SaDFi9poEKhs4Wp1pFFg=; b=z2EjfmegPv nfu6GUrOYUoa/3aiSHagMugIp13DCWOVXLwSDa/D3joPfyBDX+Vq5OAos04U764K tylGiRWSHfLmVfGi/HLcAMW/Z/FDiPpktVEZFPgdBIryH0D9t8BOmDY4vS4xmJJb ANtIpZno0LjauViBkXblb9/RQHiyHEC+o= Received: from [172.16.1.4] (125-236-193-159.adsl.xtra.co.nz [125.236.193.159]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) (Authenticated sender: aaron@thelastpickle.com) by homiemail-a55.g.dreamhost.com (Postfix) with ESMTPSA id E00CD12C03F for ; Mon, 17 Oct 2011 13:38:19 -0700 (PDT) Content-Type: text/plain; charset=iso-8859-1 Mime-Version: 1.0 (Apple Message framework v1251.1) Subject: Re: Storing pre-sorted data From: aaron morton In-Reply-To: <4E9BF7C0.9090107@l3s.de> Date: Tue, 18 Oct 2011 09:38:19 +1300 Content-Transfer-Encoding: quoted-printable Message-Id: References: <4E95C17C.5000008@l3s.de> <4E95C6C6.2060304@l3s.de> <4E969786.7040208@l3s.de> <4E97186A.5010401@l3s.de> <4E974834.2030205@l3s.de> <4E9BF7C0.9090107@l3s.de> To: user@cassandra.apache.org X-Mailer: Apple Mail (2.1251.1) X-Virus-Checked: Checked by ClamAV on apache.org Sort order is determined by the Comparator, which is an implementation = of the o.a.c.db.marshal.AbstractType class.=20 If you wish to order column (names) in a row based on an opaque (to = cassandra) byte value you can create your own implementation. You would = then need to decrypt and compare column values. I've not idea how = feasible that is in your situation with regard to security and = performance.=20 Also without understanding how you want to compare the values, composite = types may be useful as you can make some parts of the opaque value = visible to cassandra so sorting. If you can provide more info on how you = want the sorting (i may have missed it) done that would be handy.=20 Insert at offset X, even with client side synchronization, would be = troublesome at CL ONE.=20 Hope that helps.=20 ----------------- Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 17/10/2011, at 10:39 PM, Matthias Pfau wrote: > David, > thanks for your nice summary on this topic. >=20 > We would be very happy if cassandra would give us an option to = maintain the sort order on our own (application logic). That is why it = would be interesting to hear from any of the developers if it would be = easily possible to add such a feature to cassandra. >=20 > Otherwise, it seems like we have to implement sth. based on strategy = (a) because (b) is not feasible for us and (c) is a rather young = research topic which is slowly gaining more attention. >=20 > Kind Regards > Matthias >=20 > On 10/15/2011 10:55 PM, David Jeske wrote: >> Logically, whether you use cassandra or not, there is some "physics" = of >> sorted order structures which you should understand and dictate what = is >> possible. >>=20 >> In order to keep data sorted, a database needs to be able to see the >> proper sort-order of the data "all the time" not just at insertion or >> query time. When inserting a new record, it is compared with existing >> records to put it in the "right place". >>=20 >> As a result, whether you use cassandra or a different system, I = believe >> you are limited to one of these strategies: >>=20 >> (a) Encrypt the data outside the database with non-order-preserving >> encryption, and expose some "actual data" in unencrypted form for >> sorting. Since the encryption ruins the sort order, some "actual = data" >> must be exposed to sort properly. Any data you expose, even if = encoded, >> would be your actual data, because otherwise it wouldn't sort in the >> right order. You can limit the amount of data you expose, creating >> buckets instead of proper detailed sorting. Within buckets, only the >> agent capable of decrypting the data would be able to properly order = the >> data within a bucket. >>=20 >> (b) Encrypt the data inside the database. This would expose the = "actual >> data" to the database, allowing it to keep it in proper order. The = code >> to handle encryption would be handled after sort-order comparisons. = The >> code (and keys) for decryption would also be known to the database. = The >> data would need to be decryptable by the database at all times, = because >> the database will need to compare new data to existing data in order = to >> perform operations correctly. >>=20 >> (c) Use an order-preserving encryption scheme. If the encryption = output >> is in the same order as the source-data, then the database can sort = on >> the encrypted data and get proper sort-results. I don't know anything >> about this field, but doing a google search returned the following = paper... >>=20 >> = http://citeseerx.ist.psu.edu/viewdoc/download?doi=3D10.1.1.133.8664&rep=3D= rep1&type=3Dpdf >> = >>=20 >> I believe these three cases represent a totalogy of what is possible = in >> any data-storage system. So the solution you compose would involve = one >> or more of these schemes. >>=20 >> One might be tempted to generate some type of "ordinal value" >> representing the sort-order of an item. However, in order for this >> ordinal to be mathematically unrelated to the original data, it would >> have to be generated by a system which stored a copy of the entire = data, >> which would then have to use one of the above three methods. (i.e. = this >> approach is a chicken and egg problem) >=20