cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Weijun Li <weiju...@gmail.com>
Subject Re: Testing row cache feature in trunk: write should put record in cache
Date Fri, 19 Feb 2010 21:25:44 GMT
Is it in trunk too? I'm running trunk build (got from the end of last week)
in cluster and saw the disk i/o bottleneck.

On Fri, Feb 19, 2010 at 1:03 PM, Jonathan Ellis <jbellis@gmail.com> wrote:

> mmap is designed to handle that case, yes.  it is already in 0.6 branch.
>
> On Fri, Feb 19, 2010 at 2:44 PM, Weijun Li <weijunli@gmail.com> wrote:
> > I see. How much is the overhead of java serialization? Does it slow down
> the
> > system a lot? It seems to be a tradeoff between CPU usage and memory.
> >
> > As for mmap of 0.6, do you mmap the sstable data file even it is a lot
> > larger than the available memory (e.g., the data file is over 100GB while
> > you have only 8GB ram)? How efficient is mmap in this case? Is mmap
> already
> > checked into 0.6 branch?
> >
> > -Weijun
> >
> > On Fri, Feb 19, 2010 at 4:56 AM, Jonathan Ellis <jbellis@gmail.com>
> wrote:
> >>
> >> The whole point of rowcache is to avoid the serialization overhead,
> >> though.  If we just wanted the serialized form cached, we would let
> >> the os block cache handle that without adding an extra layer.  (0.6
> >> uses mmap'd i/o by default on 64bit JVMs so this is very efficient.)
> >>
> >> On Fri, Feb 19, 2010 at 3:29 AM, Weijun Li <weijunli@gmail.com> wrote:
> >> > The memory overhead issue is not directly related to GC because when
> JVM
> >> > ran
> >> > out of memory the GC has been very busy for quite a while. In my case
> >> > JVM
> >> > consumed all of the 6GB when the row cache size hit 1.4mil.
> >> >
> >> > I haven't started test the row cache feature yet. But I think data
> >> > compression is useful to reduce memory consumption because in my
> >> > impression
> >> > disk i/o is always the bottleneck for Cassandra while its CPU usage is
> >> > usually low all the time. In addition to this, compression should also
> >> > help
> >> > to reduce the number of java objects dramatically (correct me if I'm
> >> > wrong),
> >> > --especially in case we need to cache most of the data to achieve
> decent
> >> > read latency.
> >> >
> >> > If ColumnFamily is serializable it shouldn't be that hard to implement
> >> > the
> >> > compression feature which can be controlled by an option (again :-) in
> >> > storage conf xml.
> >> >
> >> > When I get to that point you can instruct me to implement this feature
> >> > along
> >> > with the row-cache-write-through. Our goal is straightforward: to
> >> > support
> >> > short read latency in high volume web application with write/read
> ratio
> >> > to
> >> > be 1:1.
> >> >
> >> > -Weijun
> >> >
> >> > -----Original Message-----
> >> > From: Jonathan Ellis [mailto:jbellis@gmail.com]
> >> > Sent: Thursday, February 18, 2010 12:04 PM
> >> > To: cassandra-user@incubator.apache.org
> >> > Subject: Re: Testing row cache feature in trunk: write should put
> record
> >> > in
> >> > cache
> >> >
> >> > Did you force a GC from jconsole to make sure you weren't just
> >> > measuring uncollected garbage?
> >> >
> >> > On Wed, Feb 17, 2010 at 2:51 PM, Weijun Li <weijunli@gmail.com>
> wrote:
> >> >> OK I'll work on the change later because there's another problem to
> >> >> solve:
> >> >> the overhead for cache is too big that 1.4mil records (1k each)
> >> >> consumed
> >> > all
> >> >> of the 6gb memory of JVM (I guess 4gb are consumed by the row cache).
> >> >> I'm
> >> >> thinking that ConcurrentHashMap is not a good choice for LRU and the
> >> >> row
> >> >> cache needs to store compressed key data to reduce memory usage. I'll
> >> >> do
> >> >> more investigation on this and let you know.
> >> >>
> >> >> -Weijun
> >> >>
> >> >> On Tue, Feb 16, 2010 at 9:22 PM, Jonathan Ellis <jbellis@gmail.com>
> >> >> wrote:
> >> >>>
> >> >>> ... tell you what, if you write the option-processing part in
> >> >>> DatabaseDescriptor I will do the actual cache part. :)
> >> >>>
> >> >>> On Tue, Feb 16, 2010 at 11:07 PM, Jonathan Ellis <jbellis@gmail.com
> >
> >> >>> wrote:
> >> >>> > https://issues.apache.org/jira/secure/CreateIssue!default.jspa<https://issues.apache.org/jira/secure/CreateIssue%21default.jspa>,
> but
> >> >>> > this is pretty low priority for me.
> >> >>> >
> >> >>> > On Tue, Feb 16, 2010 at 8:37 PM, Weijun Li <weijunli@gmail.com>
> >> >>> > wrote:
> >> >>> >> Just tried to make quick change to enable it but it didn't
work
> out
> >> > :-(
> >> >>> >>
> >> >>> >>                ColumnFamily cachedRow =
> >> >>> >> cfs.getRawCachedRow(mutation.key());
> >> >>> >>
> >> >>> >>                 // What I modified
> >> >>> >>                 if( cachedRow == null ) {
> >> >>> >>                     cfs.cacheRow(mutation.key());
> >> >>> >>                     cachedRow =
> >> >>> >> cfs.getRawCachedRow(mutation.key());
> >> >>> >>                 }
> >> >>> >>
> >> >>> >>                 if (cachedRow != null)
> >> >>> >>                     cachedRow.addAll(columnFamily);
> >> >>> >>
> >> >>> >> How can I open a ticket for you to make the change (enable
row
> >> >>> >> cache
> >> >>> >> write
> >> >>> >> through with an option)?
> >> >>> >>
> >> >>> >> Thanks,
> >> >>> >> -Weijun
> >> >>> >>
> >> >>> >> On Tue, Feb 16, 2010 at 5:20 PM, Jonathan Ellis <
> jbellis@gmail.com>
> >> >>> >> wrote:
> >> >>> >>>
> >> >>> >>> On Tue, Feb 16, 2010 at 7:17 PM, Jonathan Ellis
> >> >>> >>> <jbellis@gmail.com>
> >> >>> >>> wrote:
> >> >>> >>> > On Tue, Feb 16, 2010 at 7:11 PM, Weijun Li <
> weijunli@gmail.com>
> >> >>> >>> > wrote:
> >> >>> >>> >> Just started to play with the row cache feature
in trunk: it
> >> >>> >>> >> seems
> >> >>> >>> >> to
> >> >>> >>> >> be
> >> >>> >>> >> working fine so far except that for RowsCached
parameter you
> >> >>> >>> >> need
> >> >>> >>> >> to
> >> >>> >>> >> specify
> >> >>> >>> >> number of rows rather than a percentage (e.g.,
"20%" doesn't
> >> > work).
> >> >>> >>> >
> >> >>> >>> > 20% works, but it's 20% of the rows at server
startup.  So on
> a
> >> >>> >>> > fresh
> >> >>> >>> > start that is zero.
> >> >>> >>> >
> >> >>> >>> > Maybe we should just get rid of the % feature...
> >> >>> >>>
> >> >>> >>> (Actually, it shouldn't be hard to update this on
flush, if you
> >> >>> >>> want
> >> >>> >>> to open a ticket.)
> >> >>> >>
> >> >>> >>
> >> >>> >
> >> >>
> >> >>
> >> >
> >> >
> >
> >
>

Mime
View raw message