incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Cooper <Andrew.Coo...@nisc.coop>
Subject Re: Large system.Migration CF after upgrade to 1.1
Date Thu, 28 Nov 2013 00:21:02 GMT
We have noticed that a cluster we upgraded to 1.1.6 (from 1.0.*) still has a single large (~4GB)
row in system.Migrations on each cluster node.
There is some code in there to drop that CF at startup, but I’m not sure on the requirements
for it to run. if the time stamps have not been updated in a while copy them out of the way
and restart.

     I was able to clear Schema and Migrations CF's by nodetool drain -> stop -> move
SStables out -> start

We are also seeing heap pressure / Full GC issues when we do schema updates to this cluster
How much memory does the machine have and how is the JVM configured ?

Here are our current specs:
28 nodes.
48GB OS mem, 8 to 12 cores
3 drive raid 0 SATA 7200k RPM
~333GB load per node
6 keyspaces, ~3000 Column Families (a knowingly bad design, basically a set of CF's per tenant)
We had to ratchet up heap a few times and currently run at 20GB (from original of 8GB).  GC
can usually get down to roughly 8 or 9GB, but we seem to need a bit of extra headroom to stay
stable.
Row Cache and Key Cache disabled (due to heap pressure issues awhile back)

Load profile is usually more reads than writes, but constant on both

Schema updates seem to be better, I am not sure if removing the old CFs did the trick or not.
  We still get sporadic situations where heap sky rockets on a few nodes which starts a domino
effect around the ring of nodes going offline which equates to lots of client timeouts.
I suspect most of our issues lie in the code design of the applications on top of cassandra,
but it is always difficult to troubleshoot root cause when cassandra gets into its funk.

We are slowly migrating keyspaces into their own clusters for further isolation and performance
gains which will help long term.  We are also in the process of CF consolidations to reduce
the overall schema size.

On pre 1.1 that is often a result of memory pressure from the bloom filters and compression
meta data being on the JVM heap. Do you have a lot (i.e. > 500Million ) rows per node ?

Check how small CMS can get the heap, it may be the case that it just cannot reduce it further.

As a work around you can: increase the heap, increase bloom_filter_fp_chance (per cf) and
index_interval (yaml). My talk called “In case of emergency break glass” at the summit
in SF this year talks about this http://thelastpickle.com/speaking/2013/06/11/Speaking-Cassandra-Summit-SF-2013.html

I was at your talk and really appreciated the information, we have used it in our scale-out
to second datacenter.

Long term moving to 1.2 will help.

We do plan to upgrade to 1.2 within the next month, all of our other clusters are already
running at 1.2, but this is our largest and most problematic cluster :)

-Andrew

Hope that helps.


-----------------
Aaron Morton
New Zealand
@aaronmorton

Co-Founder & Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com<http://www.thelastpickle.com/>

On 23/11/2013, at 10:35 am, Andrew Cooper <Andrew.Cooper@nisc.coop<mailto:Andrew.Cooper@nisc.coop>>
wrote:

We have noticed that a cluster we upgraded to 1.1.6 (from 1.0.*) still has a single large
(~4GB) row in system.Migrations on each cluster node.  We are also seeing heap pressure /
Full GC issues when we do schema updates to this cluster.  If the two are related, is it possible
to somehow remove/truncate the system.Migrations CF?  If I understand correctly, version 1.1
no longer uses this CF, instead using the system.schema_* CF's.   We have multiple clusters
and clusters which were built from scratch at version 1.1 or 1.2 do no have data in system.Migrations.

I would appreciate any advice and I can provide more details if needed.

-Andrew

Andrew Cooper
National Information Solutions Cooperative®
3201 Nygren Drive NW
Mandan, ND 58554
• e-mail: andrew.cooper@nisc.coop<mailto:andrew.cooper@nisc.coop>
• phone: 866.999.6472 ext 6824
• direct: 701-667-6824




Mime
View raw message