incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Zach Richardson <j.zach.richard...@gmail.com>
Subject Re: Programmatically allow only one out of two types of rows in a CF to enter the CACHE
Date Sat, 29 Oct 2011 23:41:04 GMT
Aditya,

Have you done any benchmarking where you are specifically having read problems?

I will be surprised if using a technique described, you won't be able
to get the performance you are looking for.

Zach

On Sat, Oct 29, 2011 at 3:35 PM, Aditya Narayan <adynnn@gmail.com> wrote:
> Thanks Zach, Nice Idea !
>
> and what about looking at, may be, some custom caching solutions, leaving
> aside cassandra caching   .. ?
>
>
>
> On Sun, Oct 30, 2011 at 2:00 AM, Zach Richardson
> <j.zach.richardson@gmail.com> wrote:
>>
>> Aditya,
>>
>> Depending on how often you have to write to the database, you could
>> perform dual writes to two different column families, one that has
>> summary + details in it, and one that only has the summary.
>>
>> This way you can get everything with one query, or the summary with
>> one query, this should also help optimize your caching.
>>
>> The question here would of course be whether or not you have a read or
>> write heavy workload.  Since you seem to be concerned about the
>> caching, it sounds like you have more of a read heavy workload and
>> wouldn't pay to heavily with the dual writes.
>>
>> Zach
>>
>>
>> On Sat, Oct 29, 2011 at 2:21 PM, Mohit Anchlia <mohitanchlia@gmail.com>
>> wrote:
>> > On Sat, Oct 29, 2011 at 11:23 AM, Aditya Narayan <adynnn@gmail.com>
>> > wrote:
>> >> @Mohit:
>> >> I have stated the example scenarios in my first post under this
>> >> heading.
>> >> Also I have stated above why I want to split that data in two rows &
>> >> like
>> >> Ikeda below stated, I'm too trying out to prevent the frequently
>> >> accessed
>> >> rows being bloated with large data & want to prevent that data from
>> >> entering
>> >> cache as well.
>> >
>> > I think you are missing the point. You don't get any benefit
>> > (performance, access), you are already breaking it into 2 rows.
>> >
>> > Also, I don't know of any way where you can selectively keep the rows
>> > or keys in the cache. Other than having some background job that keeps
>> > the cache hot with those keys/rows you only have one option of keeping
>> > it in different CF since you are already breaking a row in 2 rows.
>> >
>> >>
>> >>> Okay so as most know this practice is called a wide row - we use them
>> >>> quite a lot. However, as your schema shows it will cache (while being
>> >>> active) all the row in memory.  One way we got around this issue was
>> >>> to
>> >>> basically create some materialized views of any more common data so
we
>> >>> can
>> >>> easily get to the minimum amount of information required without
>> >>> blowing too
>> >>> much memory with the larger representations.
>> >>
>> >> Yes exactly this is problem I am facing but I want to keep the both the
>> >> types(common + large/detailed) of data in single CF so that it could
>> >> server
>> >> 'two materialized views'.
>> >>
>> >>>
>> >>> My perspective is that indexing some of the higher levels of data
>> >>> would be
>> >>> the way to go - Solr or elastic search for distributed or if you know
>> >>> you
>> >>> only need it local just use a caching solution like ehcache
>> >>
>> >> What do you mean exactly by  "indexing some of the higher levels of
>> >> data" ?
>> >>
>> >> Thanks you guys!
>> >>
>> >>
>> >>
>> >>>
>> >>> Anthony
>> >>>
>> >>>
>> >>> On 28/10/2011, at 21:42 PM, Aditya Narayan wrote:
>> >>>
>> >>> > I need to keep the data of some entities in a single CF but split
in
>> >>> > two
>> >>> > rows for each entity. One row contains an overview information
for
>> >>> > the
>> >>> > entity & another row contains detailed information about entity.
I
>> >>> > am
>> >>> > wanting to keep both rows in single CF so they may be retrieved
in a
>> >>> > single
>> >>> > query when required together.
>> >>> >
>> >>> > Now the problem I am facing is that I want to cache only first
type
>> >>> > of
>> >>> > rows(ie, the overview containing rows) & avoid second type
rows(that
>> >>> > contains large data) from getting into cache.
>> >>> >
>> >>> > Is there a way I can manipulate such filtering of cache entering
>> >>> > rows
>> >>> > from a single CF?
>> >>> >
>> >>> >
>> >>>
>> >>
>> >>
>> >
>
>

Mime
View raw message