cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Boxenhorn <da...@lookin2.com>
Subject Re: Is multiget_slice performant when you're looking for lots of keys?
Date Tue, 11 May 2010 14:03:21 GMT
I have a similar issue, but I can't create a CF per type, because types are
an open-ended set in my case (they are geographical locations). So I wanted
to have one CF for types, and a supercolumn for each type, with the keys as
columns per supercolumn.

Is it a problem for me to have millions of columns in a supercolumn?

On Tue, May 11, 2010 at 4:29 PM, Jonathan Ellis <jbellis@gmail.com> wrote:

> multiget performs in O(N) with the number of rows requested.  so will
> range scanning.
>
> if you want to query millions of records of one type i would create a
> CF per type and use hadoop to parallelize the computation.
>
> On Fri, May 7, 2010 at 6:16 PM, James <rent.lupin.road@gmail.com> wrote:
> > Hi all,
> > Apologies if I'm still stuck in RDBMS mentality - first project using
> > Cassandra!
> > I'll be using Cassandra to store quite a lot (10s of millions) of
> records,
> > each of which has a type.
> > I'll want to query the records to get all of a certain type; it's an
> > analagous situation to the TaggedPosts schema from Arin's blog post
> > (http://arin.me/blog/wtf-is-a-supercolumn-cassandra-data-model).
> > The thing is, each type (or tag) row key will be pointing at millions of
> > records. I know I can use multiget_slice with all those record IDs as one
> > request, but is this The Right Way of "filtering" a large column family
> by
> > type?
> > Coming from an RDBMS-ingrained mindset, it seems kind of awkward...
> > Thanks!
> > James
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of Riptano, the source for professional Cassandra support
> http://riptano.com
>

Mime
View raw message