cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Prasad Sunkari <>
Subject Re: Secondary indexes for multi-value fields
Date Wed, 22 Dec 2010 19:04:58 GMT

I will frame my question in a different way.

Each user in my system subscribes to updates from selected other users 
(updates are aggregated from outside) and tags the users to which he/she 
is subscribed to.

In my current design, I have a column family called "Followers" keyed by 
userid in which each column name is the userid of another user following 
the first user.  Another super column family called "Subscriptions" 
again keyed by userid in which each super column name is the userid of 
the user to whose updates the "key" is subscribed to - the columns 
contain data the tags.

Obviously I use the tags in lots of places and needs the reverse index 
on tags (list of subscriptions which have a tag).  This is done by 
maintaining another column family - "SubscriptionsByTag"

Now, with the advent of secondary indexes in 0.7 can I redesign it to 
make it a little simpler?  Maybe avoid having to maintain the reverse 
index for tags?

I do understand that secondary indexes are not supported for super 
columns.  So, can I have "Subscriptions" to be a column family where 
userid maps to a comma separated list of tags?  Is it possible, out of 
the box or by implementing some interface to have secondary index over 
such multi valued columns?

What in general would be the best practices for such multi-valued fields 
on which I need a secondary index too. (Joss's reply confused me, am I 
right in thinking that range slices are only for retrieving values for a 
continuous set of keys and not really for secondary indexes)

[Sorry if I seem too naive]


On 12/22/2010 09:47 PM, Anand Somani wrote:
> One approach is to ask yourself questions as to how you would use this 
> information, for example
>   * how often to you go from user to tags
>   * how often would you want to go from tag->users.
>   * What kind of reporting would you want to do on tags and how often
>   * Can multiple people add the same tag to the same user, are they
>     maintained separately
>   * Given your business, how many users do you expect
>   * etc.
> Depending on that one approach might work better than other. I have 
> not used indexes/non id based searches (do not have that use case) in 
> Cassandra yet, so this is just based on time I have spend reading 
> about it.
> One approach using indexes was given by Jool, the other approach is 
> using reverse indexes
>   * 2 CF - one for user and one for tags (reverse index)
>   * User - might need to have a SC - with tags and some information
>     like who tagged it
>   * Tag - tag to column of users
>   * Advantage: -
>       o 1 query to find user->tags on user CF
>       o tag->users - on tag CF (I would think this would be more
>         efficient than user->tags since that will potentially hit
>         multiple rows/nodes, unless I have misunderstood secondary
>         indexes)
>   * Disadvantage
>       o Need to write to couple of CF, but writes are relatively
>         cheaper than reads in Cassandra
>       o Since you update 2 CF and there are no transaction, one might
>         succeed and the other might fail
> Even with the other suggestion of indexes you can still add the 
> tag->users.
> On Wed, Dec 22, 2010 at 4:54 AM, Prasad Sunkari < 
> <>> wrote:
>     Hi all,
>     I have a column family for users of my system and I need to have
>     tags set to these users.  My current plan is to have a column that
>     holds a string (comma separated tags).
>     I am not clear if this the best way to do it.  Specially because
>     this may lead to a complications when more than one administrator
>     is trying to tag the same user (lost updates) as well as the
>     secondary indexes (if I wanted to use the built in secondary
>     indexes).  I also am not sure if it is possible to have a
>     secondary index on a multi-valued column!
>     Another alternative is to have it in a super column with each tag
>     being a column by itself and let my application take care of the
>     secondary indexes.
>     I am currently of the opinion that the second solution is the only
>     thing that I could do.
>     Any suggestions?  Since this is my first app on Cassandra I am
>     trying to see if my opinion is correct.
>     Thanks,
>     Prasad

View raw message