Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 17652 invoked from network); 6 Jun 2010 09:16:58 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 6 Jun 2010 09:16:58 -0000 Received: (qmail 48939 invoked by uid 500); 6 Jun 2010 09:16:57 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 48782 invoked by uid 500); 6 Jun 2010 09:16:57 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 48774 invoked by uid 99); 6 Jun 2010 09:16:56 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 06 Jun 2010 09:16:56 +0000 X-ASF-Spam-Status: No, hits=2.8 required=10.0 tests=AWL,HTML_MESSAGE,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [209.85.160.172] (HELO mail-gy0-f172.google.com) (209.85.160.172) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 06 Jun 2010 09:16:48 +0000 Received: by gyh4 with SMTP id 4so1801251gyh.31 for ; Sun, 06 Jun 2010 02:16:27 -0700 (PDT) MIME-Version: 1.0 Received: by 10.150.238.1 with SMTP id l1mr12309807ybh.380.1275815786915; Sun, 06 Jun 2010 02:16:26 -0700 (PDT) Received: by 10.151.51.12 with HTTP; Sun, 6 Jun 2010 02:16:26 -0700 (PDT) X-Originating-IP: [80.179.102.198] Date: Sun, 6 Jun 2010 12:16:26 +0300 Message-ID: Subject: Tree Search in Cassandra From: David Boxenhorn To: Cassandra Mailing List Content-Type: multipart/alternative; boundary=000e0cd34a26c728a3048859025b --000e0cd34a26c728a3048859025b Content-Type: text/plain; charset=ISO-8859-1 I'm still thinking about the problem of how to handle range queries on very large sets of data, using Random Partitioning. Has anyone used tree search to solve this? What do you think? More specifically, something like this: - Store a maximum of 1000 values per supercolumn (or some other fixed number) - Each supercolumn has a "greaterChild" and a "lessChild" in addition to the values - When the number of values in the supercolumn grows beyond the maximum, split it into 3 parts, with the top third going into "greaterChild" and the bottom third into "lessChild" - To find a value, look at "greaterChild" and "lessChild" to find out whether your key is within the current range, and if not, where to look next - Range searches mean finding the first value, then looking at "greaterChild" or "lessChild" (depending on the direction of your search) until you reach the end of the range. Super Column Family: index [ [ "firstVal" : , "lastVal" : , : , "lessChild" : , "greaterChild" : ] --000e0cd34a26c728a3048859025b Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable

I'm still thinking about the problem of how to handle = range queries on very large sets of data, using Random Partitioning.
Has anyone used tree search to solve this? What do you think?

More = specifically, something like this:

- Store a maximum of 1000 values per supercolumn (or some other fixed n= umber)
- Each supercolumn has a "greaterChild" and a "les= sChild" in addition to the values
- When the number of values in th= e supercolumn grows beyond the maximum, split it into 3 parts, with the top= third going into "greaterChild" and the bottom third into "= lessChild"
- To find a value, look at "greaterChild" and "lessChild&quo= t; to find out whether your key is within the current range, and if not, wh= ere to look next
- Range searches mean finding the first value, then loo= king at "greaterChild" or "lessChild" (depending on the= direction of your search) until you reach the end of the range.

Super Column Family= :

index [ <columnFamilyId> [ &= quot;firstVal" : <val> ,
=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 "lastVal&quo= t; : <val> ,
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0 <val> : <dataId>,
=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 "le= ssChild" : <columnFamilyId> ,
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0 "greaterChild" : <columnFamilyId> ]

--000e0cd34a26c728a3048859025b--