incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ed Anuff ...@anuff.com>
Subject Re: Customized Secondary Index Schema
Date Thu, 25 Aug 2011 20:51:56 GMT
Agreed, that's what I meant by "there are a lot of simple ways to split it
up over multiple rows", assuming it necessary.

On Thu, Aug 25, 2011 at 4:24 PM, Konstantin Naryshkin
<konstantinn@a-bb.net>wrote:

> Why are you keeping all your indexes in the same row? We do a similar thing
> (maintain several indexes over the same data) and we just have an index
> column family with keys like "dest192.168.0.1" which means destination index
> of 192.168.0.1. You can do rows like User_Keys_By_Last_Name_adams and
> User_Keys_By_Last_Name_alden. You can keep the matching main column family
> key as the column name. This will ensure that your index is evenly
> distributed throughout your cluster.
>
> ----- Original Message -----
> From: "Ed Anuff" <ed@anuff.com>
> To: user@cassandra.apache.org
> Sent: Thursday, August 25, 2011 12:48:49 PM
> Subject: Re: Customized Secondary Index Schema
>
> How many unique last names do you anticipate having? How many characters in
> the last name do you anticipate keeping in your index? You can easily do the
> math to figure out how many you could fit on a node. I think you'll find
> that the ceiling might be quite a bit higher than you think. If you have
> over a couple of hundred million users it might not be the best approach.
> There are a lot of very simple ways to split it up over multiple rows. As is
> the case with most things regarding Cassandra, the off-the-cuff assumptions
> only get you so far before you have to do some math and do some tests.
>
> As I mentioned in my talk, for simple uses cases like this, you probably
> should just start with the built in secondary indexes, but I assume you
> already have explored those.
>
> Ed
>
>
> On Thu, Aug 25, 2011 at 9:27 AM, Alvin UW < alvinuw@gmail.com > wrote:
>
>
> Yes, this is what I am worrying about.
>
>
> 2011/8/24 Ryan King < ryan@twitter.com >
>
>
>
>
>
> On Tue, Aug 23, 2011 at 10:03 AM, Alvin UW < alvinuw@gmail.com > wrote:
> > Hello,
> >
> > As mentioned by Ed Anuff in his blog and slides, one way to build
> customized
> > secondary index is:
> > We use one CF, each row to represent a secondary index, with the
> secondary
> > index name as row key.
> > For example,
> >
> > Indexes = {
> > "User_Keys_By_Last_Name" : {
> > "adams" : "e5d61f2b-…",
> > "alden" : "e80a17ba-…",
> > "anderson" : "e5d61f2b-…",
> > "davis" : "e719962b-…",
> > "doe" : "e78ece0f-…",
> > "franks" : "e66afd40-…",
> > … : …,
> > }
> > }
> >
> > But the whole secondary index is partitioned into a single node, because
> of
> > the row key.
> > All the queries against this secondary index will go to this node. Of
> > course, there are some replica nodes.
> >
> > Do you think this is a scalability problem, or any better solution to
> solve
> > it?
>
> Its certainly a scalability problem in that this solution has a hard
> ceiling (this index can't get larger than the capacity of any single
> node). It will probably work on small datasets, but if your dataset is
> small then why are you using cassandra?
>
> -ryan
>
>
>

Mime
View raw message