incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Zach Richardson <j.zach.richard...@gmail.com>
Subject Re: Programmatically allow only one out of two types of rows in a CF to enter the CACHE
Date Sat, 29 Oct 2011 20:30:57 GMT
Aditya,

Depending on how often you have to write to the database, you could
perform dual writes to two different column families, one that has
summary + details in it, and one that only has the summary.

This way you can get everything with one query, or the summary with
one query, this should also help optimize your caching.

The question here would of course be whether or not you have a read or
write heavy workload.  Since you seem to be concerned about the
caching, it sounds like you have more of a read heavy workload and
wouldn't pay to heavily with the dual writes.

Zach


On Sat, Oct 29, 2011 at 2:21 PM, Mohit Anchlia <mohitanchlia@gmail.com> wrote:
> On Sat, Oct 29, 2011 at 11:23 AM, Aditya Narayan <adynnn@gmail.com> wrote:
>> @Mohit:
>> I have stated the example scenarios in my first post under this heading.
>> Also I have stated above why I want to split that data in two rows & like
>> Ikeda below stated, I'm too trying out to prevent the frequently accessed
>> rows being bloated with large data & want to prevent that data from entering
>> cache as well.
>
> I think you are missing the point. You don't get any benefit
> (performance, access), you are already breaking it into 2 rows.
>
> Also, I don't know of any way where you can selectively keep the rows
> or keys in the cache. Other than having some background job that keeps
> the cache hot with those keys/rows you only have one option of keeping
> it in different CF since you are already breaking a row in 2 rows.
>
>>
>>> Okay so as most know this practice is called a wide row - we use them
>>> quite a lot. However, as your schema shows it will cache (while being
>>> active) all the row in memory.  One way we got around this issue was to
>>> basically create some materialized views of any more common data so we can
>>> easily get to the minimum amount of information required without blowing too
>>> much memory with the larger representations.
>>
>> Yes exactly this is problem I am facing but I want to keep the both the
>> types(common + large/detailed) of data in single CF so that it could server
>> 'two materialized views'.
>>
>>>
>>> My perspective is that indexing some of the higher levels of data would be
>>> the way to go - Solr or elastic search for distributed or if you know you
>>> only need it local just use a caching solution like ehcache
>>
>> What do you mean exactly by  "indexing some of the higher levels of data" ?
>>
>> Thanks you guys!
>>
>>
>>
>>>
>>> Anthony
>>>
>>>
>>> On 28/10/2011, at 21:42 PM, Aditya Narayan wrote:
>>>
>>> > I need to keep the data of some entities in a single CF but split in two
>>> > rows for each entity. One row contains an overview information for the
>>> > entity & another row contains detailed information about entity. I am
>>> > wanting to keep both rows in single CF so they may be retrieved in a single
>>> > query when required together.
>>> >
>>> > Now the problem I am facing is that I want to cache only first type of
>>> > rows(ie, the overview containing rows) & avoid second type rows(that
>>> > contains large data) from getting into cache.
>>> >
>>> > Is there a way I can manipulate such filtering of cache entering rows
>>> > from a single CF?
>>> >
>>> >
>>>
>>
>>
>

Mime
View raw message