Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (nike.apache.org: domain of rantav@gmail.com designates
 209.85.214.172 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=mime-version:in-reply-to:references:from:date:message-id:subject:to
         :content-type;
        b=jGKasu4iwbmOH89cTtWA8a7AjElwZR45vecCaebf/HpVtvQdJ8bm9j5TFB75Ol+Bb9
         O7wjCPOqNgq1q4wBflJE7GCR6tTVmDgmJnvbjZcpjN1N115VeWzW8JkUZTojrW5Yy9Lg
         HHnbkinAW4dwBTaXzrfeGFZl7bn+KAlvfExsg=
MIME-Version: 1.0
In-Reply-To: <AANLkTiloKtLVmaFMwR5Rtj9XlD4wS_F39OQSGSal6HpC@mail.gmail.com>
References: <AANLkTiloKtLVmaFMwR5Rtj9XlD4wS_F39OQSGSal6HpC@mail.gmail.com>
From: Ran Tavory <rantav@gmail.com>
Date: Sun, 6 Jun 2010 12:19:59 +0300
Message-ID: <AANLkTinPhYLHLQAMEUywzQhRehT5c_prLy22k2Jp7zzs@mail.gmail.com>
Subject: Re: Tree Search in Cassandra
To: user@cassandra.apache.org
Content-Type: multipart/alternative; boundary=0015176f13109e3fa2048859105d

--0015176f13109e3fa2048859105d
Content-Type: text/plain; charset=UTF-8

sounds interesting... btree on top of cassandra ;)

On Sun, Jun 6, 2010 at 12:16 PM, David Boxenhorn <david@lookin2.com> wrote:

> I'm still thinking about the problem of how to handle range queries on very
> large sets of data, using Random Partitioning.
>
> Has anyone used tree search to solve this? What do you think?
>
> More specifically, something like this:
>
> - Store a maximum of 1000 values per supercolumn (or some other fixed
> number)
> - Each supercolumn has a "greaterChild" and a "lessChild" in addition to
> the values
> - When the number of values in the supercolumn grows beyond the maximum,
> split it into 3 parts, with the top third going into "greaterChild" and the
> bottom third into "lessChild"
> - To find a value, look at "greaterChild" and "lessChild" to find out
> whether your key is within the current range, and if not, where to look next
> - Range searches mean finding the first value, then looking at
> "greaterChild" or "lessChild" (depending on the direction of your search)
> until you reach the end of the range.
>
> Super Column Family:
>
> index [ <columnFamilyId> [ "firstVal" : <val> ,
>                            "lastVal" : <val> ,
>                            <val> : <dataId>,
>                            "lessChild" : <columnFamilyId> ,
>                            "greaterChild" : <columnFamilyId> ]
>
>

--0015176f13109e3fa2048859105d
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">sounds interesting... btree on top of cassandra ;)<br><br>=
<div class=3D"gmail_quote">On Sun, Jun 6, 2010 at 12:16 PM, David Boxenhorn=
 <span dir=3D"ltr">&lt;<a href=3D"mailto:david@lookin2.com">david@lookin2.c=
om</a>&gt;</span> wrote:<br>

<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex;"><div dir=3D"ltr">I&#39;m still thinking abo=
ut the problem of how to handle range queries on very large sets of data, u=
sing Random Partitioning.<br>

<br>Has anyone used tree search to solve this? What do you think?<br><br>Mo=
re specifically, something like this:<br>
<br>- Store a maximum of 1000 values per supercolumn (or some other fixed n=
umber)<br>- Each supercolumn has a &quot;greaterChild&quot; and a &quot;les=
sChild&quot; in addition to the values<br>- When the number of values in th=
e supercolumn grows beyond the maximum, split it into 3 parts, with the top=
 third going into &quot;greaterChild&quot; and the bottom third into &quot;=
lessChild&quot;<br>


- To find a value, look at &quot;greaterChild&quot; and &quot;lessChild&quo=
t; to find out whether your key is within the current range, and if not, wh=
ere to look next<br>- Range searches mean finding the first value, then loo=
king at &quot;greaterChild&quot; or &quot;lessChild&quot; (depending on the=
 direction of your search) until you reach the end of the range.<br>


<br><span style=3D"font-family:courier new,monospace">Super Column Family:<=
/span><br><br style=3D"font-family:courier new,monospace"><span style=3D"fo=
nt-family:courier new,monospace">index [ &lt;columnFamilyId&gt; [ &quot;fir=
stVal&quot; : &lt;val&gt; , </span><br style=3D"font-family:courier new,mon=
ospace">


<span style=3D"font-family:courier new,monospace">=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 &quot;lastVal&quo=
t; : &lt;val&gt; , <br>=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 &lt;val&gt; : &lt;dataId&gt;, <br>=C2=A0=C2=
=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 &q=
uot;lessChild&quot; : &lt;columnFamilyId&gt; , <br>


=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0 &quot;greaterChild&quot; : &lt;columnFamilyId&gt; ]</span><br style=
=3D"font-family:courier new,monospace"><br></div>
</blockquote></div><br></div>

--0015176f13109e3fa2048859105d--