incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ed Anuff ...@anuff.com>
Subject Re: Customized Secondary Index Schema
Date Thu, 25 Aug 2011 16:48:49 GMT
How many unique last names do you anticipate having?  How many characters in
the last name do you anticipate keeping in your index?  You can easily do
the math to figure out how many you could fit on a node.  I think you'll
find that the ceiling might be quite a bit higher than you think.  If you
have over a couple of hundred million users it might not be the best
approach.  There are a lot of very simple ways to split it up over multiple
rows.  As is the case with most things regarding Cassandra, the off-the-cuff
assumptions only get you so far before you have to do some math and do some
tests.

As I mentioned in my talk, for simple uses cases like this, you probably
should just start with the built in secondary indexes, but I assume you
already have explored those.

Ed

On Thu, Aug 25, 2011 at 9:27 AM, Alvin UW <alvinuw@gmail.com> wrote:

> Yes, this is what I am worrying about.
>
> 2011/8/24 Ryan King <ryan@twitter.com>
>
>> On Tue, Aug 23, 2011 at 10:03 AM, Alvin UW <alvinuw@gmail.com> wrote:
>> > Hello,
>> >
>> > As mentioned by Ed Anuff in his blog and slides, one way to build
>> customized
>> > secondary index is:
>> > We use one CF, each row to represent a secondary index, with the
>> secondary
>> > index name as row key.
>> > For example,
>> >
>> > Indexes = {
>> > "User_Keys_By_Last_Name" : {
>> > "adams" : "e5d61f2b-…",
>> > "alden" : "e80a17ba-…",
>> > "anderson" : "e5d61f2b-…",
>> > "davis" : "e719962b-…",
>> > "doe" : "e78ece0f-…",
>> > "franks" : "e66afd40-…",
>> > … : …,
>> > }
>> > }
>> >
>> > But the whole secondary index is partitioned into a single node, because
>> of
>> > the row key.
>> > All the queries against this secondary index will go to this node. Of
>> > course, there are some replica nodes.
>> >
>> > Do you think this is a scalability problem, or any better solution to
>> solve
>> > it?
>>
>> Its certainly a scalability problem in that this solution has a hard
>> ceiling (this index can't get larger than the capacity of any single
>> node). It will probably work on small datasets, but if your dataset is
>> small then why are you using cassandra?
>>
>> -ryan
>>
>
>

Mime
View raw message