incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Chang <pete...@gmail.com>
Subject Re: Strategies for storing lexically ordered data in supercolumns
Date Sat, 13 Mar 2010 01:07:13 GMT
But wouldn't name + UUID be considered volatile? That was the crux of my
questions.

On Fri, Mar 12, 2010 at 1:07 PM, Brandon Williams <driftx@gmail.com> wrote:

> On Thu, Mar 11, 2010 at 12:54 AM, Peter Chang <peter78@gmail.com> wrote:
>
>> I'm wondering about good strategies for picking keys that I want to be
>> lexically sorted in a super column family. For example, my data looks like
>> this:
>>
>> [user1_uuid][connections][some_key_for_user2] = ""
>> [user1_uuid][connections][some_key_for_user3] = ""
>>
>> I was thinking that I wanted some_key_for_user2 to be sorted by a user's
>> name. So I was thinking I set the subcolumn compareWith to UTF8Type or
>> BytesType and construct a key
>>
>> [user's lastname + user's firstname + user's uuid]
>>
>> This would result in sorted subcolumn and user list. That's fine. But I
>> wonder what would happen if, say, a user changes their last name. Happens
>> rarely but I imagine people getting married and modifying their name. Now
>> the sort is no longer correct. There seems to be some bad consequences to
>> creating keys based on data that can change.
>>
>> So what is the general (elegant, easy to maintain) strategy here? Always
>> sort in your server-side code and don't bother trying to have the data
>> sorted?
>>
>
> Having row keys based on something potentially volatile is something I
> would avoid since that determines which machine the row belongs to and
> moving data between machines isn't a cheap operation.
>
> What you'll probably want to do is make the key something unique (like a
> uuid), store the user's name as a column on the row (thus making it easy to
> update) and maintain a secondary index to get the named-based sorting you
> want.  If you're expecting a few million users, maintaining the index in a
> special row will work fine (eg, the row name is "NAMEINDEX" and the columns
> are the name+uuid similar to what you described.)  If you have billions of
> users, you'll need to get a bit fancier (partition based on letter of the
> last name, for example.)
>
> -Brandon
>

Mime
View raw message