Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: neutral (nike.apache.org: local policy)
MIME-Version: 1.0
In-Reply-To: <AANLkTiklVfU8T1C0RdXDYajU54Bynnbe1NPWqKsWoO1K@mail.gmail.com>
References: <AANLkTiloKtLVmaFMwR5Rtj9XlD4wS_F39OQSGSal6HpC@mail.gmail.com>
	<AANLkTinPhYLHLQAMEUywzQhRehT5c_prLy22k2Jp7zzs@mail.gmail.com>
	<AANLkTiklVfU8T1C0RdXDYajU54Bynnbe1NPWqKsWoO1K@mail.gmail.com>
Date: Mon, 7 Jun 2010 09:39:08 +0300
Message-ID: <AANLkTil5HYGD_1ssiepR1uu3azFrQHGQeB3KGGTV4BEC@mail.gmail.com>
Subject: Re: Tree Search in Cassandra
From: David Boxenhorn <david@lookin2.com>
To: user@cassandra.apache.org
Content-Type: multipart/alternative; boundary=000e0cd70c4008acb104886aee60

--000e0cd70c4008acb104886aee60
Content-Type: text/plain; charset=ISO-8859-1

Is batch mutate atomic? If not, can we make it so?

On Mon, Jun 7, 2010 at 4:11 AM, Tatu Saloranta <tsaloranta@gmail.com> wrote:

> Yeah, or maybe just "clustering", since there is no branching structure.
> It's quite commonly useful even on regular b-tree style storage (BDB
> et al), as it can reduce per-entry overhead quite a bit. And allows
> very efficient compression, if entries have lots of redundancy (xml or
> json serialized data).
>
> I doubt this can be done reliably from client perspective. While a
> good idea from functionality perspective, problem is that it requires
> some level of atomic operations or locking, since updates are
> multi-step operations. From server side I guess it would be similar to
> work on allowing atomic multi-part operations (like ones being worked
> on to implement counters?).
>
> -+ Tatu +-
>
> On Sun, Jun 6, 2010 at 2:19 AM, Ran Tavory <rantav@gmail.com> wrote:
> > sounds interesting... btree on top of cassandra ;)
> >
> > On Sun, Jun 6, 2010 at 12:16 PM, David Boxenhorn <david@lookin2.com>
> wrote:
> >>
> >> I'm still thinking about the problem of how to handle range queries on
> >> very large sets of data, using Random Partitioning.
> >>
> >> Has anyone used tree search to solve this? What do you think?
> >>
> >> More specifically, something like this:
> >>
> >> - Store a maximum of 1000 values per supercolumn (or some other fixed
> >> number)
> >> - Each supercolumn has a "greaterChild" and a "lessChild" in addition to
> >> the values
> >> - When the number of values in the supercolumn grows beyond the maximum,
> >> split it into 3 parts, with the top third going into "greaterChild" and
> the
> >> bottom third into "lessChild"
> >> - To find a value, look at "greaterChild" and "lessChild" to find out
> >> whether your key is within the current range, and if not, where to look
> next
> >> - Range searches mean finding the first value, then looking at
> >> "greaterChild" or "lessChild" (depending on the direction of your
> search)
> >> until you reach the end of the range.
> >>
> >> Super Column Family:
> >>
> >> index [ <columnFamilyId> [ "firstVal" : <val> ,
> >>                            "lastVal" : <val> ,
> >>                            <val> : <dataId>,
> >>                            "lessChild" : <columnFamilyId> ,
> >>                            "greaterChild" : <columnFamilyId> ]
> >>
> >
> >
>

--000e0cd70c4008acb104886aee60
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">Is batch mutate atomic? If not, can we make it so? <br><br=
><div class=3D"gmail_quote">On Mon, Jun 7, 2010 at 4:11 AM, Tatu Saloranta =
<span dir=3D"ltr">&lt;<a href=3D"mailto:tsaloranta@gmail.com">tsaloranta@gm=
ail.com</a>&gt;</span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin: 0pt 0pt 0pt 0.8ex; borde=
r-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">Yeah, or maybe ju=
st &quot;clustering&quot;, since there is no branching structure.<br>
It&#39;s quite commonly useful even on regular b-tree style storage (BDB<br=
>
et al), as it can reduce per-entry overhead quite a bit. And allows<br>
very efficient compression, if entries have lots of redundancy (xml or<br>
json serialized data).<br>
<br>
I doubt this can be done reliably from client perspective. While a<br>
good idea from functionality perspective, problem is that it requires<br>
some level of atomic operations or locking, since updates are<br>
multi-step operations. From server side I guess it would be similar to<br>
work on allowing atomic multi-part operations (like ones being worked<br>
on to implement counters?).<br>
<font color=3D"#888888"><br>
-+ Tatu +-<br>
</font><div><div></div><div class=3D"h5"><br>
On Sun, Jun 6, 2010 at 2:19 AM, Ran Tavory &lt;<a href=3D"mailto:rantav@gma=
il.com">rantav@gmail.com</a>&gt; wrote:<br>
&gt; sounds interesting... btree on top of cassandra ;)<br>
&gt;<br>
&gt; On Sun, Jun 6, 2010 at 12:16 PM, David Boxenhorn &lt;<a href=3D"mailto=
:david@lookin2.com">david@lookin2.com</a>&gt; wrote:<br>
&gt;&gt;<br>
&gt;&gt; I&#39;m still thinking about the problem of how to handle range qu=
eries on<br>
&gt;&gt; very large sets of data, using Random Partitioning.<br>
&gt;&gt;<br>
&gt;&gt; Has anyone used tree search to solve this? What do you think?<br>
&gt;&gt;<br>
&gt;&gt; More specifically, something like this:<br>
&gt;&gt;<br>
&gt;&gt; - Store a maximum of 1000 values per supercolumn (or some other fi=
xed<br>
&gt;&gt; number)<br>
&gt;&gt; - Each supercolumn has a &quot;greaterChild&quot; and a &quot;less=
Child&quot; in addition to<br>
&gt;&gt; the values<br>
&gt;&gt; - When the number of values in the supercolumn grows beyond the ma=
ximum,<br>
&gt;&gt; split it into 3 parts, with the top third going into &quot;greater=
Child&quot; and the<br>
&gt;&gt; bottom third into &quot;lessChild&quot;<br>
&gt;&gt; - To find a value, look at &quot;greaterChild&quot; and &quot;less=
Child&quot; to find out<br>
&gt;&gt; whether your key is within the current range, and if not, where to=
 look next<br>
&gt;&gt; - Range searches mean finding the first value, then looking at<br>
&gt;&gt; &quot;greaterChild&quot; or &quot;lessChild&quot; (depending on th=
e direction of your search)<br>
&gt;&gt; until you reach the end of the range.<br>
&gt;&gt;<br>
&gt;&gt; Super Column Family:<br>
&gt;&gt;<br>
&gt;&gt; index [ &lt;columnFamilyId&gt; [ &quot;firstVal&quot; : &lt;val&gt=
; ,<br>
&gt;&gt; =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=
=A0=A0=A0=A0 &quot;lastVal&quot; : &lt;val&gt; ,<br>
&gt;&gt; =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=
=A0=A0=A0=A0 &lt;val&gt; : &lt;dataId&gt;,<br>
&gt;&gt; =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=
=A0=A0=A0=A0 &quot;lessChild&quot; : &lt;columnFamilyId&gt; ,<br>
&gt;&gt; =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=
=A0=A0=A0=A0 &quot;greaterChild&quot; : &lt;columnFamilyId&gt; ]<br>
&gt;&gt;<br>
&gt;<br>
&gt;<br>
</div></div></blockquote></div><br></div>

--000e0cd70c4008acb104886aee60--