cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hiller, Dean" <>
Subject Re: index_interval memory savings in our case(if you are curious)Š (and performance result)...
Date Thu, 21 Mar 2013 19:38:08 GMT
It had only been running for 2 hours back then, but it has been a full 24
hours now and our read ping program is still showing the same read times
pretty consistently.


On 3/21/13 1:51 AM, "Andras Szerdahelyi"
<> wrote:

>Wow. SO LCS with bloom filter fp chance of 0.1 and an index sampling rate
>of 512 on a column family of 1.7billion rows each node yields 100% result
>on first sstable reads? That sounds amazing. And I assume this is
>cfhistograms output from a node that has been on 512 for a while? ( I
>still think its unlikely 1.2x re-samples sstables on startup -- I'm on on
>1.1x though ) For LCS, same fp chance and sampling rate, with 300-500mil
>rows per node ( 300-400GB ) on 1.1x my sstable reads for a single read got
>pretty much out of control.
>On 20/03/13 14:35, "Hiller, Dean" <> wrote:
>>I am using LCS so bloom filter fp default for 1.2.2 is 0.1 so my
>>bloomfilter size is 1.27G RAM(nodetool cfstats)....1.7 billion rows each
>>My cfstats for this CF is attached(Since cut and paste screwed up the
>>formatting).  During testing in QA, we were not sure if index_interval
>>change was working so we dug into the code to find out, it basically
>>to immediately convert on startup though doesn't log anything except at a
>>"debug" level which we don't have on.
>>On 3/20/13 6:58 AM, "Andras Szerdahelyi"
>><> wrote:
>>>I am curious, thanks. ( I am in the same situation, big nodes choking
>>>under 300-400G data load, 500mil keys )
>>>How does your "cfhistograms Keyspace CF" output look like? How many
>>>sstable reads ?
>>>What is your bloom filter fp chance ?
>>>On 20/03/13 13:54, "Hiller, Dean" <> wrote:
>>>>Oh, and to give you an idea of memory savings, we had a node at 10G RAM
>>>>usage...we had upped a few nodes to 16G from 8G as we don't have our
>>>>nodes ready yet(we know we should be at 8G but we would have a dead
>>>>cluster if we did that).
>>>>On startup, the initial RAM is around 6-8G.  Startup with
>>>>index_interval=512 resulted in a 2.5G-2.8G initial RAM and I have seen
>>>>grow to 3.3G and back down to 2.8G.  We just rolled this out an hour
>>>>Our website response time is the same as before as well.
>>>>We rolled to only 2 nodes(out of 6) in our cluster so far to test it
>>>>and let it soak a bit.  We will slowly roll to more nodes monitoring
>>>>performance as we go.  Also, since dynamic snitch is not working with
>>>>SimpleSnitch, we know that just one slow node affects our website(from
>>>>personal pain/experience of nodes hitting RAM limit and slowing down
>>>>causing website to get real slow).
>>>>On 3/20/13 6:41 AM, "Andras Szerdahelyi"
>>>><> wrote:
>>>>>2. Upping index_interval from 128 to 512 (this seemed to reduce our
>>>>>usage significantly!!!)
>>>>>I'd be very careful with that as a one-stop improvement solution for
>>>>>reasons AFAIK
>>>>>1) you have to rebuild stables ( not an issue if you are evaluating,
>>>>>test writes.. Etc, not so much in production )
>>>>>2) it can affect reads ( number of sstable reads to serve a read )
>>>>>especially if your key/row cache is ineffective
>>>>>On 20/03/13 13:34, "Hiller, Dean" <> wrote:
>>>>>>Also, look at the cassandra logs.  I bet you see the typicalŠblah
>>>>>>at 0.85, doing memory cleanup which is not exactly GC but cassandra
>>>>>>managementŠ..and of course, you have GC on top of that.
>>>>>>If you need to get your memory down, there are multiple ways
>>>>>>1. Switching size tiered compaction to leveled compaction(with 1
>>>>>>narrow rows, this helped us quite a bit)
>>>>>>2. Upping index_interval from 128 to 512 (this seemed to reduce our
>>>>>>usage significantly!!!)
>>>>>>3. Just add more nodes as moving the rows to other servers reduces
>>>>>>from #1 and #2 above since the server would have less rows
>>>>>>On 3/20/13 6:29 AM, "Andras Szerdahelyi"
>>>>>><> wrote:
>>>>>>>I'd say GC. Please fill in form CASS-FREEZE-001 below and get
>>>>>>>:-) ( sorry )
>>>>>>>How big is your JVM heap ? How many CPUs ?
>>>>>>>Garbage collection taking long ? ( look for log lines from
>>>>>>>Running out of heap ? ( "heap is .. full" log lines )
>>>>>>>Any tasks backing up / being dropped ? ( nodetool tpstats and
>>>>>>>in last .. ms" log lines )
>>>>>>>Are writes really slow? ( nodetool cfhistograms Keyspace
>>>>>>>How much is lots of data? Wide or skinny rows? Mutations/sec ?
>>>>>>>Which Compaction Strategy are you using? Output of show schema
>>>>>>>cassandra-cli ) for the relevant Keyspace/CF might help as well
>>>>>>>What consistency are you doing your writes with ? I assume ONE
>>>>>>>you have a single node.
>>>>>>>What are the values for these settings in cassandra.yaml
>>>>>>>Which version of Cassandra?
>>>>>>>From:  Joel Samuelsson <>
>>>>>>>Reply-To:  "" <>
>>>>>>>Date:  Wednesday 20 March 2013 13:06
>>>>>>>To:  "" <>
>>>>>>>Subject:  Cassandra freezes
>>>>>>>I've been trying to load test a one node cassandra cluster. When
>>>>>>>lots of data, the Cassandra node freezes for 4-5 minutes during
>>>>>>>neither reads nor writes are served.
>>>>>>>During this time, Cassandra takes 100% of a single CPU core.
>>>>>>>My initial thought was that this was Cassandra flushing memtables
>>>>>>>disk, however, the disk i/o is very low during this time.
>>>>>>>Any idea what my problem could be?
>>>>>>>I'm running in a virtual environment in which I have no control
>>>>>>>So commit log and data directory is (probably) on the same drive.
>>>>>>>Best regards,
>>>>>>>Joel Samuelsson

View raw message