incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Robson <>
Subject Re: Design Pattern - Tag Cloud / Inverted Index
Date Sun, 27 Dec 2009 16:38:54 GMT
2009/12/27 August Zajonc <>

> Looking at the data model a simple solution is two column families,
> one containing items as the row-key with tags as columns, and a second
> with tags as the row-key with items as columns. This gives me fast
> access at the cost of 2x the writes (cheap) and storage (also cheap).
> So not bad.

I think this is the normal model.

However, there is no need to put them in separate column-families, you could
simply use non-overlapping keys.

There is however, a scalability problem when you have a single tag with a
very large number of items, or vice versa, that you will have a lot of
columns in a single CF / key. As this needs to be held in the ram of a node
during a query (and possibly other operations), it will blow the memory
usage up.

I guess the solution may be to create a number of different keys for the
same tag.

In any case, querying a very large number of items is problematic - the user
will not usually want them all, so you'd need to prioritise them somehow
anyway, so it might be sufficient to only store the "highest priority" items
against a single tag key (and have other keys for the lower priority ones).
How you define priority is application-specific.


View raw message