I recently became aware of a Java driver bug that might be causing similar symptoms. Do you, perchance, have any keyspaces that have replication defined against non-existent Data Centers?

https://datastax-oss.atlassian.net/browse/JAVA-702

If so, fixing that replication setting and restarting the agents should fix this issue.

All the best,


datastax_logo.png

Sebastián Estévez

Solutions Architect | 954 905 8615 | sebastian.estevez@datastax.com

linkedin.png facebook.png twitter.png g+.png



DataStax is the fastest, most scalable distributed database technology, delivering Apache Cassandra to the world’s most innovative enterprises. Datastax is built to be agile, always-on, and predictably scalable to any size. With more than 500 customers in 45 countries, DataStax is the database technology and transactional backbone of choice for the worlds most innovative companies such as Netflix, Adobe, Intuit, and eBay.

On Mon, Jul 20, 2015 at 11:02 AM, David Comer <david.comer@outlook.com> wrote:

May I please be discontinued from this email?

 

May I unsubscribe?

 

 

From: John Wong [mailto:gokoproject@gmail.com]
Sent: Monday, July 20, 2015 8:37 AM
To: user@cassandra.apache.org
Subject: Re: OpsCenter datastax-agent 300% CPU

 

Hi all & Sebastain

We recently encountered similar issue. At least we observed agent constantly died with OOM. Unfortunately, we are still with 1.2.X and it will be a while before we can totally move to Cassandra 2 series.

Is there a backport patch to fix OOM in OpsCenter 5.1 branch? Please let us know because losing OpsCenter is a huge deal for administrator.

Thank you.

 

On Wed, Jul 15, 2015 at 6:28 PM, Mikhail Strebkov <strebkov@gmail.com> wrote:

Thanks, I think it got resolved after an update.

 

Kind regards,

Mikhail

 

On Wed, Jul 15, 2015 at 2:04 PM, Sebastian Estevez <sebastian.estevez@datastax.com> wrote:

OpsCenter 5.2 has a couple of fixes that may result in the symptoms you described:

 

·         Fixed issues with agent OOM when storing metrics for large numbers of tables. (OPSC-5934

·         Improved handling of metrics overflow queue on agent. (OPSC-4618)

 

 

Let us know if this stops once you upgrade.


All the best,

 

Image removed by sender. datastax_logo.png

Sebastián Estévez

Solutions Architect | 954 905 8615 | sebastian.estevez@datastax.com

Image removed by sender. linkedin.pngImage removed by sender. facebook.pngImage removed by sender. twitter.pngImage removed by sender. g+.pngImage removed by sender.


Image removed by sender.

 

DataStax is the fastest, most scalable distributed database technology, delivering Apache Cassandra to the world’s most innovative enterprises. Datastax is built to be agile, always-on, and predictably scalable to any size. With more than 500 customers in 45 countries, DataStax is the database technology and transactional backbone of choice for the worlds most innovative companies such as Netflix, Adobe, Intuit, and eBay.

 

On Tue, Jul 14, 2015 at 4:40 PM, Mikhail Strebkov <strebkov@gmail.com> wrote:

 

On Tue, Jul 14, 2015 at 12:01 PM, Mikhail Strebkov <strebkov@gmail.com> wrote:

OpsCenter 5.1.3 and datastax-agent-5.1.3-standalone.jar

 

On Tue, Jul 14, 2015 at 12:00 PM, Sebastian Estevez <sebastian.estevez@datastax.com> wrote:

What version of the agents and what version of OpsCenter are you running?

I recently saw something like this and upgrading to matching versions fixed the issue.

On Jul 14, 2015 2:58 PM, "Mikhail Strebkov" <strebkov@gmail.com> wrote:

Hi everyone,

 

Recently I've noticed that most of the nodes have OpsCenter agents running at 300% CPU. Each node has 4 cores, so agents are using 75% of total available CPU.

 

We're running 5 nodes with OpenSource Cassandra 2.1.8 in AWS using Community AMI. OpsCenter version is 5.1.3. We're using Oracle Java version 1.8.0_45.

 

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND

31501 cassandr  20   0 3599m 296m  14m S  339  2.0  48:20.39 /opt/jdk/jdk1.8.0_45/bin/java -Xmx128M -Djclouds.mpu.parts.magnitude=100000 -Djclouds.mpu.parts.size=16777216 -Dopscenter.ssl.trustStore=/var/lib/datastax-agent/ssl/agentKeyStore -Dopscenter.ssl.keyStore=/var/lib/datastax-agent/ssl/agentKeyStore -Dopscenter.ssl.keyStorePassword=opscenter -Dagent-pidfile=/var/run/datastax-agent/datastax-agent.pid -Dlog4j.configuration=file:/etc/datastax-agent/log4j.properties -Djava.security.auth.login.config=/etc/datastax-agent/kerberos.config -jar datastax-agent-5.1.3-standalone.jar /var/lib/datastax-agent/conf/address.yaml

 

The logs from the agent looks strange to me: https://gist.github.com/kluyg/21f78af7adff0a940ed3

 

The cluster itself seems to be fine, the load is small, nothing bad in Cassandra system.log.

 

Does anyone know what to tune to bring it back to normal?

 

Thanks,

Mikhail