cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Terje Marthinussen <tmarthinus...@gmail.com>
Subject Re: Cassandra memtable and GC
Date Mon, 22 Nov 2010 18:04:44 GMT
Look at the graph again. Especially from the first posting.
The records/second read (by the client) goes down as disk reads goes down.

Looks like something is fishy with the memtables.

Terje

On Tue, Nov 23, 2010 at 1:54 AM, Edward Capriolo <edlinuxguru@gmail.com>wrote:

> On Mon, Nov 22, 2010 at 8:28 AM, Shotaro Kamio <kamioshot@gmail.com>
> wrote:
> > Hi Peter,
> >
> > I've tested again with recording LiveSSTableCount and MemtableDataSize
> > via jmx. I guess this result supports my suspect on memtable
> > performance because I cannot find Full GC this time.
> > This is a result in smaller data size (160million records on
> > cassandra) on different disk configuration from my previous post. But
> > the general picture doesn't change.
> >
> > The attached files:
> > - graph-read-throughput-diskT.png:  read throughput on my client program.
> > - graph-diskT-stat-with-jmx.png: graph of cpu load, LiveSSTableCount
> > and logarithm of MemtableDataSize.
> > - log-gc.20101122-12:41.160M.log.gz: GC log with -XX:+PrintGC
> > -XX:+PrintGCDetails -XX:+PrintGCTimeStamps
> >
> > As you can see from the second graph, logarithm of MemtableDataSize
> > and cpu load has a clear correlation. When a memtable is flushed and a
> > new SSTable is created (LiveSSTableCount is incremented), read
> > performance will be recovered. But it degrades soon.
> > I couldn't find Full GC in GC log in this test. So, I guess that this
> > performance is not a result of GC activity.
> >
> >
> > Regards,
> > Shotaro
> >
> >
> > On Sat, Nov 20, 2010 at 6:37 PM, Peter Schuller
> > <peter.schuller@infidyne.com> wrote:
> >>> After a memtable flush, you see minimum cpu and maximum read
> >>> throughput both in term of disk and cassandra records read.
> >>> As memtable increase in size, cpu goes up and read drops.
> >>> If this is because of memtable or GC performance issue, this is the
> >>> big question.
> >>>
> >>> As each memtable is just 128MB when flushed, I don't really expect GC
> >>> problem or caching issues.
> >>
> >> A memtable is basically just a ConcurrentSkipListMap. Unless you are
> >> somehow triggering some kind of degenerate casein the CSLM itself,
> >> which seems unlikely, the only common circumstance where filling the
> >> memtable should be resulting in a very significant performance drop
> >> should be if you're running really close to heap size and causing
> >> additional GC asymptotally as you're growing the memtable.
> >>
> >> But that doesn't seem to be the case. I don't know, maybe I missed
> >> something in your original post, but I'm not sure what to suggest that
> >> I haven't already without further information/hands-on
> >> experimentation/observation.
> >>
> >> But running with verbose GC as I mentioned should at least be a good
> >> start (-Xloggc:path/to/gclog
> >> -XX:+PrintGC -XX:+PrintGCDetails -XX:+PrintGCTimestamps).
> >>
> >> --
> >> / Peter Schuller
> >>
> >
> >
> >
> > --
> > Shotaro Kamio
> >
>
> "As you can see from the second graph, logarithm of MemtableDataSize
> and cpu load has a clear correlation."
>
> This makes sense.
>
> "You'll see the disk read throughput is periodically going down and up.
> At 17:45:00, it shows zero disk read/sec. " --> This must mean that
> your load is being completely served from cache. If you have a very
> high cache hit rate CPU/Memory are the ONLY factor. If CPU and
> memtables are the only factor then larger memtables will start to
> perform slower then smaller memtables.
>
> Possibly with SSD the conventional thinking on Larger SSTables does
> not apply (at least for your active set)
>

Mime
View raw message