cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From August Zajonc <augu...@augustz.com>
Subject Re: Design Pattern - Tag Cloud / Inverted Index
Date Sun, 27 Dec 2009 18:28:19 GMT
On Sun, Dec 27, 2009 at 11:38 AM, Mark Robson <markxr@gmail.com> wrote:
> 2009/12/27 August Zajonc <augustz@augustz.com>
>>
>> Looking at the data model a simple solution is two column families,
>> one containing items as the row-key with tags as columns, and a second
>> with tags as the row-key with items as columns. This gives me fast
>> access at the cost of 2x the writes (cheap) and storage (also cheap).
>> So not bad.
>
> I think this is the normal model.
>
> However, there is no need to put them in separate column-families, you could
> simply use non-overlapping keys.

Got it. One question I wasn't sure of is if that buys me a way to
atomically update the index to maintain consistency. I don't think I
can.

>
> There is however, a scalability problem when you have a single tag with a
> very large number of items, or vice versa, that you will have a lot of
> columns in a single CF / key. As this needs to be held in the ram of a node
> during a query (and possibly other operations), it will blow the memory
> usage up.

Got it. Part of this depends on the metadata overhead to store a
column. Clearly Name, Value, Timestamp is a part of it, but is there
anything else in terms of storage / memory overhead per column I
should be thinking of when I consider how many column are reasonable
to fit in a single CF / Key.

Cheers,

- August




> I guess the solution may be to create a number of different keys for the
> same tag.
>
> In any case, querying a very large number of items is problematic - the user
> will not usually want them all, so you'd need to prioritise them somehow
> anyway, so it might be sufficient to only store the "highest priority" items
> against a single tag key (and have other keys for the lower priority ones).
> How you define priority is application-specific.
>
> Mark
>



-- 
August Consulting
PO Box 410384
San Francisco, CA 94141
415-358-1850 (p)
415-354-8383 (f)

Mime
View raw message