Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 17794 invoked from network); 6 Jun 2010 09:20:48 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 6 Jun 2010 09:20:48 -0000 Received: (qmail 50172 invoked by uid 500); 6 Jun 2010 09:20:47 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 50089 invoked by uid 500); 6 Jun 2010 09:20:47 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 50080 invoked by uid 99); 6 Jun 2010 09:20:47 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 06 Jun 2010 09:20:47 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=FREEMAIL_FROM,HTML_MESSAGE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of rantav@gmail.com designates 209.85.214.172 as permitted sender) Received: from [209.85.214.172] (HELO mail-iw0-f172.google.com) (209.85.214.172) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 06 Jun 2010 09:20:40 +0000 Received: by iwn42 with SMTP id 42so2464237iwn.31 for ; Sun, 06 Jun 2010 02:20:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:mime-version:received:in-reply-to :references:from:date:message-id:subject:to:content-type; bh=YJ96tcLknP+HqPglJ8ft55utfibtpsqZq95RbcYY6Xk=; b=kiqr32F4CfX8iJzMUgdQyw28LvOPXHT5480xt4XaZwRO4AEyZRjwvmgqidhY3ALVmx Keh5EbQ0qhraLjdVs2G+jOUm5ONGV26AkwzwyMX1fRcGfMZicD9SIo+docYQ8CqcCVaK nUeib7g6aVb9r8StWvR0eoX4xnr6o//3ZL6+Q= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; b=jGKasu4iwbmOH89cTtWA8a7AjElwZR45vecCaebf/HpVtvQdJ8bm9j5TFB75Ol+Bb9 O7wjCPOqNgq1q4wBflJE7GCR6tTVmDgmJnvbjZcpjN1N115VeWzW8JkUZTojrW5Yy9Lg HHnbkinAW4dwBTaXzrfeGFZl7bn+KAlvfExsg= Received: by 10.231.69.71 with SMTP id y7mr2888380ibi.136.1275816019115; Sun, 06 Jun 2010 02:20:19 -0700 (PDT) MIME-Version: 1.0 Received: by 10.231.172.79 with HTTP; Sun, 6 Jun 2010 02:19:59 -0700 (PDT) In-Reply-To: References: From: Ran Tavory Date: Sun, 6 Jun 2010 12:19:59 +0300 Message-ID: Subject: Re: Tree Search in Cassandra To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=0015176f13109e3fa2048859105d X-Virus-Checked: Checked by ClamAV on apache.org --0015176f13109e3fa2048859105d Content-Type: text/plain; charset=UTF-8 sounds interesting... btree on top of cassandra ;) On Sun, Jun 6, 2010 at 12:16 PM, David Boxenhorn wrote: > I'm still thinking about the problem of how to handle range queries on very > large sets of data, using Random Partitioning. > > Has anyone used tree search to solve this? What do you think? > > More specifically, something like this: > > - Store a maximum of 1000 values per supercolumn (or some other fixed > number) > - Each supercolumn has a "greaterChild" and a "lessChild" in addition to > the values > - When the number of values in the supercolumn grows beyond the maximum, > split it into 3 parts, with the top third going into "greaterChild" and the > bottom third into "lessChild" > - To find a value, look at "greaterChild" and "lessChild" to find out > whether your key is within the current range, and if not, where to look next > - Range searches mean finding the first value, then looking at > "greaterChild" or "lessChild" (depending on the direction of your search) > until you reach the end of the range. > > Super Column Family: > > index [ [ "firstVal" : , > "lastVal" : , > : , > "lessChild" : , > "greaterChild" : ] > > --0015176f13109e3fa2048859105d Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
sounds interesting... btree on top of cassandra ;)

=
On Sun, Jun 6, 2010 at 12:16 PM, David Boxenhorn= <david@lookin2.c= om> wrote:
I'm still thinking abo= ut the problem of how to handle range queries on very large sets of data, u= sing Random Partitioning.

Has anyone used tree search to solve this? What do you think?

Mo= re specifically, something like this:

- Store a maximum of 1000 values per supercolumn (or some other fixed n= umber)
- Each supercolumn has a "greaterChild" and a "les= sChild" in addition to the values
- When the number of values in th= e supercolumn grows beyond the maximum, split it into 3 parts, with the top= third going into "greaterChild" and the bottom third into "= lessChild"
- To find a value, look at "greaterChild" and "lessChild&quo= t; to find out whether your key is within the current range, and if not, wh= ere to look next
- Range searches mean finding the first value, then loo= king at "greaterChild" or "lessChild" (depending on the= direction of your search) until you reach the end of the range.

Super Column Family:<= /span>

index [ <columnFamilyId> [ "fir= stVal" : <val> ,
=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 "lastVal&quo= t; : <val> ,
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 <val> : <dataId>,
=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 &q= uot;lessChild" : <columnFamilyId> ,
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0 "greaterChild" : <columnFamilyId> ]



--0015176f13109e3fa2048859105d--