cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jared winick <jaredwin...@gmail.com>
Subject Re: Scalable data model for a Metadata database
Date Tue, 09 Feb 2010 16:51:26 GMT
Thanks for the specific suggestions Jonathan, I really appreciate it.

On Tue, Feb 9, 2010 at 9:37 AM, Jonathan Ellis <jbellis@gmail.com> wrote:
> On Tue, Feb 9, 2010 at 10:01 AM, Jared winick <jaredwinick@gmail.com> wrote:
>> Somehow I need to partition the data better.  Would a recommendation
>> be to “split” the “sex” key into multiple keys? For example I could
>> append the year and month to the key (“sex_022010”) to partition the
>> data by the month it was insert.
>
> That's one possibility.  Another would be to kill two birds with one
> stone and add the age to that key, so you'd have male_20 (probably
> better: male_1990), etc.
>
> Fundamentally TANSTAAFL and if you need to scale queries w/ lots of
> criteria like this you will have to choose (sometimes from more than
> one of) these options:
>
>  - have a lot of machines so you can parallelize brute force queries,
> e.g. w/ Hadoop
>  - precompute specific "indexes" like sex_birthdate above
>   - note, with supercolumns you can also materialize the whole
> "person" in subcolumns, rather than doing an extra lookup for each
> index hit
>  - use less-specific indexes (e.g. separate sex & birthdate indexes to
> continue the example) and do more work on the client
>
> -Jonathan
>

Mime
View raw message