Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: neutral (athena.apache.org: local policy)
MIME-Version: 1.0
In-Reply-To: <AANLkTin3u8KzilGm7YpRDcWBFbLQqUwr_uSTMees1X1x@mail.gmail.com>
References: <m2ueb27b681005071616y7564c8d1pc8f563e406ecb2ea@mail.gmail.com>
	 <AANLkTin3u8KzilGm7YpRDcWBFbLQqUwr_uSTMees1X1x@mail.gmail.com>
Date: Tue, 11 May 2010 17:03:21 +0300
Message-ID: <AANLkTilnKrt-df2VZUX1qTLo7SYdqyrzorEytc7k8GuK@mail.gmail.com>
Subject: Re: Is multiget_slice performant when you're looking for lots of
	keys?
From: David Boxenhorn <david@lookin2.com>
To: user@cassandra.apache.org
Content-Type: multipart/alternative; boundary=00163642651bf99d49048651fcac

--00163642651bf99d49048651fcac
Content-Type: text/plain; charset=ISO-8859-1

I have a similar issue, but I can't create a CF per type, because types are
an open-ended set in my case (they are geographical locations). So I wanted
to have one CF for types, and a supercolumn for each type, with the keys as
columns per supercolumn.

Is it a problem for me to have millions of columns in a supercolumn?

On Tue, May 11, 2010 at 4:29 PM, Jonathan Ellis <jbellis@gmail.com> wrote:

> multiget performs in O(N) with the number of rows requested.  so will
> range scanning.
>
> if you want to query millions of records of one type i would create a
> CF per type and use hadoop to parallelize the computation.
>
> On Fri, May 7, 2010 at 6:16 PM, James <rent.lupin.road@gmail.com> wrote:
> > Hi all,
> > Apologies if I'm still stuck in RDBMS mentality - first project using
> > Cassandra!
> > I'll be using Cassandra to store quite a lot (10s of millions) of
> records,
> > each of which has a type.
> > I'll want to query the records to get all of a certain type; it's an
> > analagous situation to the TaggedPosts schema from Arin's blog post
> > (http://arin.me/blog/wtf-is-a-supercolumn-cassandra-data-model).
> > The thing is, each type (or tag) row key will be pointing at millions of
> > records. I know I can use multiget_slice with all those record IDs as one
> > request, but is this The Right Way of "filtering" a large column family
> by
> > type?
> > Coming from an RDBMS-ingrained mindset, it seems kind of awkward...
> > Thanks!
> > James
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of Riptano, the source for professional Cassandra support
> http://riptano.com
>

--00163642651bf99d49048651fcac
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">I have a similar issue, but I can&#39;t create a CF per ty=
pe, because types are an open-ended set in my case (they are geographical l=
ocations). So I wanted to have one CF for types, and a supercolumn for each=
 type, with the keys as columns per supercolumn. <br>
<br>Is it a problem for me to have millions of columns in a supercolumn?<br=
><br><div class=3D"gmail_quote">On Tue, May 11, 2010 at 4:29 PM, Jonathan E=
llis <span dir=3D"ltr">&lt;<a href=3D"mailto:jbellis@gmail.com">jbellis@gma=
il.com</a>&gt;</span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin: 0pt 0pt 0pt 0.8ex; borde=
r-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">multiget performs=
 in O(N) with the number of rows requested. =A0so will<br>
range scanning.<br>
<br>
if you want to query millions of records of one type i would create a<br>
CF per type and use hadoop to parallelize the computation.<br>
<div><div></div><div class=3D"h5"><br>
On Fri, May 7, 2010 at 6:16 PM, James &lt;<a href=3D"mailto:rent.lupin.road=
@gmail.com">rent.lupin.road@gmail.com</a>&gt; wrote:<br>
&gt; Hi all,<br>
&gt; Apologies if I&#39;m still stuck in RDBMS mentality - first project us=
ing<br>
&gt; Cassandra!<br>
&gt; I&#39;ll be using Cassandra to store quite a lot (10s of millions) of =
records,<br>
&gt; each of which has a type.<br>
&gt; I&#39;ll want to query the records to get all of a certain type; it=
9;s an<br>
&gt; analagous situation to the TaggedPosts schema from Arin&#39;s blog pos=
t<br>
&gt; (<a href=3D"http://arin.me/blog/wtf-is-a-supercolumn-cassandra-data-mo=
del" target=3D"_blank">http://arin.me/blog/wtf-is-a-supercolumn-cassandra-d=
ata-model</a>).<br>
&gt; The thing is, each type (or tag) row key will be pointing at millions =
of<br>
&gt; records. I know I can use multiget_slice with all those record IDs as =
one<br>
&gt; request, but is this The Right Way of &quot;filtering&quot; a large co=
lumn family by<br>
&gt; type?<br>
&gt; Coming from an RDBMS-ingrained mindset, it seems kind of awkward...<br=
>
&gt; Thanks!<br>
&gt; James<br>
<br>
<br>
<br>
</div></div><font color=3D"#888888">--<br>
Jonathan Ellis<br>
Project Chair, Apache Cassandra<br>
co-founder of Riptano, the source for professional Cassandra support<br>
<a href=3D"http://riptano.com" target=3D"_blank">http://riptano.com</a><br>
</font></blockquote></div><br></div>

--00163642651bf99d49048651fcac--