cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ney, Richard" <Richard....@Aspect.com>
Subject Re: Trying to find cause of exception
Date Mon, 02 Jan 2017 18:59:17 GMT
Hi Amit,

I’m seeing “not marking as down” in the logs like this one,

WARN  [GossipTasks:1] 2016-12-29 08:48:02,665 FailureDetector.java:287 - Not marking nodes
down due to local pause of 6641241564 > 5000000000

Now the end of the system.log files on all three nodes in one of the data centers are full
of NullPointerExceptions and AssertionErrors like these below, would these errors be the cause
or a symptom?


WARN  [SharedPool-Worker-1] 2017-01-02 07:13:56,441 AbstractLocalAwareExecutorService.java:169
- Uncaught exception on thread Thread[SharedPool-Worker-1,5,main]: {}
java.lang.NullPointerException: null
WARN  [SharedPool-Worker-1] 2017-01-02 07:15:02,865 AbstractLocalAwareExecutorService.java:169
- Uncaught exception on thread Thread[SharedPool-Worker-1,5,main]: {}
java.lang.AssertionError: null
                at org.apache.cassandra.db.rows.BufferCell.<init>(BufferCell.java:49)
~[apache-cassandra-3.3.0.jar:3.3.0]
                at org.apache.cassandra.db.rows.BufferCell.tombstone(BufferCell.java:88) ~[apache-cassandra-3.3.0.jar:3.3.0]
                at org.apache.cassandra.db.rows.BufferCell.tombstone(BufferCell.java:83) ~[apache-cassandra-3.3.0.jar:3.3.0]
                at org.apache.cassandra.db.rows.BufferCell.purge(BufferCell.java:175) ~[apache-cassandra-3.3.0.jar:3.3.0]
                at org.apache.cassandra.db.rows.ComplexColumnData.lambda$purge$107(ComplexColumnData.java:165)
~[apache-cassandra-3.3.0.jar:3.3.0]
                at org.apache.cassandra.utils.btree.BTree$FiltrationTracker.apply(BTree.java:650)
~[apache-cassandra-3.3.0.jar:3.3.0]
                at org.apache.cassandra.utils.btree.BTree.transformAndFilter(BTree.java:693)
~[apache-cassandra-3.3.0.jar:3.3.0]
                at org.apache.cassandra.utils.btree.BTree.transformAndFilter(BTree.java:668)
~[apache-cassandra-3.3.0.jar:3.3.0]
                at org.apache.cassandra.db.rows.ComplexColumnData.transformAndFilter(ComplexColumnData.java:170)
~[apache-cassandra-3.3.0.jar:3.3.0]
                at org.apache.cassandra.db.rows.ComplexColumnData.purge(ComplexColumnData.java:165)
~[apache-cassandra-3.3.0.jar:3.3.0]
                at org.apache.cassandra.db.rows.ComplexColumnData.purge(ComplexColumnData.java:43)
~[apache-cassandra-3.3.0.jar:3.3.0]
                at org.apache.cassandra.db.rows.BTreeRow.lambda$purge$102(BTreeRow.java:333)
~[apache-cassandra-3.3.0.jar:3.3.0]
                at org.apache.cassandra.utils.btree.BTree$FiltrationTracker.apply(BTree.java:650)
~[apache-cassandra-3.3.0.jar:3.3.0]
                at org.apache.cassandra.utils.btree.BTree.transformAndFilter(BTree.java:693)
~[apache-cassandra-3.3.0.jar:3.3.0]
                at org.apache.cassandra.utils.btree.BTree.transformAndFilter(BTree.java:668)
~[apache-cassandra-3.3.0.jar:3.3.0]
                at org.apache.cassandra.db.rows.BTreeRow.transformAndFilter(BTreeRow.java:338)
~[apache-cassandra-3.3.0.jar:3.3.0]
                at org.apache.cassandra.db.rows.BTreeRow.purge(BTreeRow.java:333) ~[apache-cassandra-3.3.0.jar:3.3.0]
                at org.apache.cassandra.db.partitions.PurgeFunction.applyToRow(PurgeFunction.java:88)
~[apache-cassandra-3.3.0.jar:3.3.0]
                at org.apache.cassandra.db.transform.BaseRows.hasNext(BaseRows.java:116) ~[apache-cassandra-3.3.0.jar:3.3.0]
                at org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:133)
~[apache-cassandra-3.3.0.jar:3.3.0]
                at org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:89)
~[apache-cassandra-3.3.0.jar:3.3.0]
                at org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:79)
~[apache-cassandra-3.3.0.jar:3.3.0]
                at org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$Serializer.serialize(UnfilteredPartitionIterators.java:294)
~[apache-cassandra-3.3.0.jar:3.3.0]
                at org.apache.cassandra.db.ReadResponse$LocalDataResponse.build(ReadResponse.java:134)
~[apache-cassandra-3.3.0.jar:3.3.0]
                at org.apache.cassandra.db.ReadResponse$LocalDataResponse.<init>(ReadResponse.java:127)
~[apache-cassandra-3.3.0.jar:3.3.0]
                at org.apache.cassandra.db.ReadResponse$LocalDataResponse.<init>(ReadResponse.java:123)
~[apache-cassandra-3.3.0.jar:3.3.0]
                at org.apache.cassandra.db.ReadResponse.createDataResponse(ReadResponse.java:65)
~[apache-cassandra-3.3.0.jar:3.3.0]
                at org.apache.cassandra.db.ReadCommand.createResponse(ReadCommand.java:292)
~[apache-cassandra-3.3.0.jar:3.3.0]
                at org.apache.cassandra.db.ReadCommandVerbHandler.doVerb(ReadCommandVerbHandler.java:50)
~[apache-cassandra-3.3.0.jar:3.3.0]
                at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:64)
~[apache-cassandra-3.3.0.jar:3.3.0]
                at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
~[na:1.8.0_111]
                at org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:164)
~[apache-cassandra-3.3.0.jar:3.3.0]
                at org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:136)
[apache-cassandra-3.3.0.jar:3.3.0]
                at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) [apache-cassandra-3.3.0.jar:3.3.0]
                at java.lang.Thread.run(Thread.java:745) [na:1.8.0_111]
WARN  [SharedPool-Worker-2] 2017-01-02 07:15:03,132 AbstractLocalAwareExecutorService.java:169
- Uncaught exception on thread Thread[SharedPool-Worker-2,5,main]: {}
java.lang.RuntimeException: java.lang.NullPointerException
                at org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2461)
~[apache-cassandra-3.3.0.jar:3.3.0]
                at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
~[na:1.8.0_111]
                at org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:164)
~[apache-cassandra-3.3.0.jar:3.3.0]
                at org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:136)
[apache-cassandra-3.3.0.jar:3.3.0]
                at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) [apache-cassandra-3.3.0.jar:3.3.0]
                at java.lang.Thread.run(Thread.java:745) [na:1.8.0_111]
Caused by: java.lang.NullPointerException: null


RICHARD NEY
TECHNICAL DIRECTOR, RESEARCH & DEVELOPMENT
+1 (978) 848.6640 WORK
+1 (916) 846.2353 MOBILE
UNITED STATES
richard.ney@aspect.com<mailto:richard.ney@aspect.com>
aspect.com<http://www.aspect.com/>

[mailSigLogo-rev.jpg]

From: Amit Singh F <amit.f.singh@ericsson.com>
Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Date: Monday, January 2, 2017 at 4:34 AM
To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Subject: RE: Trying to find cause of exception

Hello,

Few pointers :


a.)    Can you check in system.log for similar msgs like “marking as down”  on the node
which gives err msg if yes, then please check for GC pause . Heavy load is one of the reason
for this.

b.)    Can you try connecting cqlsh to that node once you get this kind of msgs. Are you able
to connect?


Regards
Amit

From: Ney, Richard [mailto:Richard.Ney@Aspect.com]
Sent: Monday, January 02, 2017 3:30 PM
To: user@cassandra.apache.org
Subject: Trying to find cause of exception

My development team has been trying to track down the cause of this Read timeout (30 seconds
or more at times) exception below. We’re running a 2 data center deployment with 3 nodes
in each data center. Our tables are setup with replication factor = 2 and we have 16G dedicated
to the heap with the G1GC for garbage collection. Our systems are AWS M4.2xlarge with 8 CPUs
and 32GB of RAM and we have 2 general purpose EBS volumes on each node of 500GB each. Once
we start getting these timeouts the cluster doesn’t recover and we are required to shut
all Cassandra node down and restart. If anyone has any tips on where to look or what commands
to run to help us diagnose this issue we’d be eternally grateful.

2017-01-02 04:33:35.161 [ERROR] [report-compute.ffbec924-ce44-11e6-9e21-0adb9d2dd624] [reportCompute]
[ahlworkerslave2.bos.manhattan.aspect-cloud.net:31312] [WorktypeMetrics] Persistence failure
when replaying events for persistenceId [/fsms/pens/worktypes/bmwbpy.314]. Last known sequence
number [0]
java.util.concurrent.ExecutionException: com.datastax.driver.core.exceptions.ReadTimeoutException:
Cassandra timeout during read query at consistency ONE (1 responses were required but only
0 replica responded)
    at com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:299)
    at com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:286)
    at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116)
    at akka.persistence.cassandra.package$$anon$1$$anonfun$run$1.apply(package.scala:17)
    at scala.util.Try$.apply(Try.scala:192)
Caused by: com.datastax.driver.core.exceptions.ReadTimeoutException: Cassandra timeout during
read query at consistency ONE (1 responses were required but only 0 replica responded)
    at com.datastax.driver.core.exceptions.ReadTimeoutException.copy(ReadTimeoutException.java:115)
    at com.datastax.driver.core.Responses$Error.asException(Responses.java:124)
    at com.datastax.driver.core.RequestHandler$SpeculativeExecution.onSet(RequestHandler.java:477)
    at com.datastax.driver.core.Connection$Dispatcher.channelRead0(Connection.java:1005)
    at com.datastax.driver.core.Connection$Dispatcher.channelRead0(Connection.java:928)
Caused by: com.datastax.driver.core.exceptions.ReadTimeoutException: Cassandra timeout during
read query at consistency ONE (1 responses were required but only 0 replica responded)
    at com.datastax.driver.core.Responses$Error$1.decode(Responses.java:62)
    at com.datastax.driver.core.Responses$Error$1.decode(Responses.java:37)
    at com.datastax.driver.core.Message$ProtocolDecoder.decode(Message.java:266)
    at com.datastax.driver.core.Message$ProtocolDecoder.decode(Message.java:246)
    at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:89)


RICHARD NEY
TECHNICAL DIRECTOR, RESEARCH & DEVELOPMENT
+1 (978) 848.6640 WORK
+1 (916) 846.2353 MOBILE
UNITED STATES
richard.ney@aspect.com<mailto:richard.ney@aspect.com>
aspect.com<http://www.aspect.com/>

[ailSigLogo-rev.jpg]
This email (including any attachments) is proprietary to Aspect Software, Inc. and may contain
information that is confidential. If you have received this message in error, please do not
read, copy or forward this message. Please notify the sender immediately, delete it from your
system and destroy any copies. You may not further disclose or distribute this email or its
attachments.
This email (including any attachments) is proprietary to Aspect Software, Inc. and may contain
information that is confidential. If you have received this message in error, please do not
read, copy or forward this message. Please notify the sender immediately, delete it from your
system and destroy any copies. You may not further disclose or distribute this email or its
attachments.
Mime
View raw message