cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "A Markov (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-8696) nodetool repair on cassandra 2.1.2 keyspaces return java.lang.RuntimeException: Could not create snapshot
Date Mon, 29 Jun 2015 13:27:07 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-8696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14605603#comment-14605603
] 

A Markov commented on CASSANDRA-8696:
-------------------------------------

Yuki, I am not sure that increasing timeout to 1 hour is a good solution. We are using 2.1.7
system and getting into situation that repair totally stops for an hour. I might be wrong
but it looks like repair doesn't start another session until all tasks of a current session
are finished one way or another. So if one of the tasks of the current session fails without
immediate message, in our example it is exactly same error about failed snapshot

 RepairJob.java:145 - Error occurred during snapshot phase

repair just idles for an hour resuming it's work after processing that exception. As a result
of that system could not finish repair in realistic time (still working after 7 days).

> nodetool repair on cassandra 2.1.2 keyspaces return java.lang.RuntimeException: Could
not create snapshot
> ---------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-8696
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8696
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Jeff Liu
>            Assignee: Yuki Morishita
>             Fix For: 2.1.x
>
>
> When trying to run nodetool repair -pr on cassandra node ( 2.1.2), cassandra throw java
exceptions: cannot create snapshot. 
> the error log from system.log:
> {noformat}
> INFO  [STREAM-IN-/10.97.9.110] 2015-01-28 02:07:28,815 StreamResultFuture.java:166 -
[Stream #692c1450-a692-11e4-9973-070e938df227 ID#0] Prepare completed. Receiving 2 files(221187
bytes), sending 5 files(632105 bytes)
> INFO  [STREAM-IN-/10.97.9.110] 2015-01-28 02:07:29,046 StreamResultFuture.java:180 -
[Stream #692c1450-a692-11e4-9973-070e938df227] Session with /10.97.9.110 is complete
> INFO  [STREAM-IN-/10.97.9.110] 2015-01-28 02:07:29,046 StreamResultFuture.java:212 -
[Stream #692c1450-a692-11e4-9973-070e938df227] All sessions completed
> INFO  [STREAM-IN-/10.97.9.110] 2015-01-28 02:07:29,047 StreamingRepairTask.java:96 -
[repair #685e3d00-a692-11e4-9973-070e938df227] streaming task succeed, returning response
to /10.98.194.68
> INFO  [RepairJobTask:1] 2015-01-28 02:07:29,065 StreamResultFuture.java:86 - [Stream
#692c6270-a692-11e4-9973-070e938df227] Executing streaming plan for Repair
> INFO  [StreamConnectionEstablisher:4] 2015-01-28 02:07:29,065 StreamSession.java:213
- [Stream #692c6270-a692-11e4-9973-070e938df227] Starting streaming to /10.66.187.201
> INFO  [StreamConnectionEstablisher:4] 2015-01-28 02:07:29,070 StreamCoordinator.java:209
- [Stream #692c6270-a692-11e4-9973-070e938df227, ID#0] Beginning stream session with /10.66.187.201
> INFO  [STREAM-IN-/10.66.187.201] 2015-01-28 02:07:29,465 StreamResultFuture.java:166
- [Stream #692c6270-a692-11e4-9973-070e938df227 ID#0] Prepare completed. Receiving 5 files(627994
bytes), sending 5 files(632105 bytes)
> INFO  [StreamReceiveTask:22] 2015-01-28 02:07:31,971 StreamResultFuture.java:180 - [Stream
#692c6270-a692-11e4-9973-070e938df227] Session with /10.66.187.201 is complete
> INFO  [StreamReceiveTask:22] 2015-01-28 02:07:31,972 StreamResultFuture.java:212 - [Stream
#692c6270-a692-11e4-9973-070e938df227] All sessions completed
> INFO  [StreamReceiveTask:22] 2015-01-28 02:07:31,972 StreamingRepairTask.java:96 - [repair
#685e3d00-a692-11e4-9973-070e938df227] streaming task succeed, returning response to /10.98.194.68
> ERROR [RepairJobTask:1] 2015-01-28 02:07:39,444 RepairJob.java:127 - Error occurred during
snapshot phase
> java.lang.RuntimeException: Could not create snapshot at /10.97.9.110
>         at org.apache.cassandra.repair.SnapshotTask$SnapshotCallback.onFailure(SnapshotTask.java:77)
~[apache-cassandra-2.1.2.jar:2.1.2]
>         at org.apache.cassandra.net.MessagingService$5$1.run(MessagingService.java:347)
~[apache-cassandra-2.1.2.jar:2.1.2]
>         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) ~[na:1.7.0_45]
>         at java.util.concurrent.FutureTask.run(FutureTask.java:262) ~[na:1.7.0_45]
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
[na:1.7.0_45]
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
[na:1.7.0_45]
>         at java.lang.Thread.run(Thread.java:744) [na:1.7.0_45]
> INFO  [AntiEntropySessions:6] 2015-01-28 02:07:39,445 RepairSession.java:260 - [repair
#6f85e740-a692-11e4-9973-070e938df227] new session: will sync /10.98.194.68, /10.66.187.201,
/10.226.218.135 on range (12817179804668051873746972069086
> 2638799,128635403083592540777731520865977436165] for events.[bigint0text, bigint0boolean,
bigint0int, dataset_catalog, column_categories, bigint0double, bigint0bigint]
> ERROR [AntiEntropySessions:5] 2015-01-28 02:07:39,445 RepairSession.java:303 - [repair
#685e3d00-a692-11e4-9973-070e938df227] session completed with the following error
> java.io.IOException: Failed during snapshot creation.
>         at org.apache.cassandra.repair.RepairSession.failedSnapshot(RepairSession.java:344)
~[apache-cassandra-2.1.2.jar:2.1.2]
>         at org.apache.cassandra.repair.RepairJob$2.onFailure(RepairJob.java:128) ~[apache-cassandra-2.1.2.jar:2.1.2]
>         at com.google.common.util.concurrent.Futures$4.run(Futures.java:1172) ~[guava-16.0.jar:na]
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
[na:1.7.0_45]
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
[na:1.7.0_45]
>         at java.lang.Thread.run(Thread.java:744) [na:1.7.0_45]
> ERROR [AntiEntropySessions:5] 2015-01-28 02:07:39,446 CassandraDaemon.java:153 - Exception
in thread Thread[AntiEntropySessions:5,5,RMI Runtime]
> java.lang.RuntimeException: java.io.IOException: Failed during snapshot creation.
>         at com.google.common.base.Throwables.propagate(Throwables.java:160) ~[guava-16.0.jar:na]
>         at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:32) ~[apache-cassandra-2.1.2.jar:2.1.2]
>         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) ~[na:1.7.0_45]
>         at java.util.concurrent.FutureTask.run(FutureTask.java:262) ~[na:1.7.0_45]
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
~[na:1.7.0_45]
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
[na:1.7.0_45]
>         at java.lang.Thread.run(Thread.java:744) [na:1.7.0_45]
> Caused by: java.io.IOException: Failed during snapshot creation.
>         at org.apache.cassandra.repair.RepairSession.failedSnapshot(RepairSession.java:344)
~[apache-cassandra-2.1.2.jar:2.1.2]
>         at org.apache.cassandra.repair.RepairJob$2.onFailure(RepairJob.java:128) ~[apache-cassandra-2.1.2.jar:2.1.2]
>         at com.google.common.util.concurrent.Futures$4.run(Futures.java:1172) ~[guava-16.0.jar:na]
>         ... 3 common frames omitted
> {noformat}
> The only change we did recently was to change tablespace replication factor from 2 to
3 before seeing those errors. Also same time we start seeing timeout errors from application.

> the timeout error is something like:
> {noformat}
> core.exceptions.ReadTimeoutException: Cassandra timeout during read query at consistency
ONE (1 responses were required but only 0 replica responded)
>     at com.datastax.driver.core.exceptions.ReadTimeoutException.copy(ReadTimeoutException.java:69)
~[com.datastax.cassandra.cassandra-driver-core-2.1.3.jar:na]
>     at com.datastax.driver.core.Responses$Error.asException(Responses.java:100) ~[com.datastax.cassandra.cassandra-driver-core-2.1.3.jar:na]
>     at com.datastax.driver.core.DefaultResultSetFuture.onSet(DefaultResultSetFuture.java:110)
~[com.datastax.cassandra.cassandra-driver-core-2.1.3.jar:na]
>     at com.datastax.driver.core.RequestHandler.setFinalResult(RequestHandler.java:249)
~[com.datastax.cassandra.cassandra-driver-core-2.1.3.jar:na]
>     at com.datastax.driver.core.RequestHandler.onSet(RequestHandler.java:433) ~[com.datastax.cassandra.cassandra-driver-core-2.1.3.jar:na]
>     at com.datastax.driver.core.Connection$Dispatcher.messageReceived(Connection.java:668)
~[com.datastax.cassandra.cassandra-driver-core-2.1.3.jar:na]
>     at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
~[io.netty.netty-3.9.0.Final.jar:na]
>     at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
~[io.netty.netty-3.9.0.Final.jar:na]
>     at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
~[io.netty.netty-3.9.0.Final.jar:na]
>     at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296) ~[io.netty.netty-3.9.0.Final.jar:na]
>     at org.jboss.netty.handler.codec.oneone.OneToOneDecoder.handleUpstream(OneToOneDecoder.java:70)
~[io.netty.netty-3.9.0.Final.jar:na]
>     at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
~[io.netty.netty-3.9.0.Final.jar:na]
>     at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
~[io.netty.netty-3.9.0.Final.jar:na]
>     at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296) ~[io.netty.netty-3.9.0.Final.jar:na]
>     at org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:462)
~[io.netty.netty-3.9.0.Final.jar:na]
>     at org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:443)
~[io.netty.netty-3.9.0.Final.jar:na]
>     at org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:303)
~[io.netty.netty-3.9.0.Final.jar:na]
>     at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
~[io.netty.netty-3.9.0.Final.jar:na]
>     at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
~[io.netty.netty-3.9.0.Final.jar:na]
>     at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
~[io.netty.netty-3.9.0.Final.jar:na]
>     at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268) ~[io.netty.netty-3.9.0.Final.jar:na]
>     at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255) ~[io.netty.netty-3.9.0.Final.jar:na]
>     at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88) ~[io.netty.netty-3.9.0.Final.jar:na]
>     at org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108)
~[io.netty.netty-3.9.0.Final.jar:na]
>     at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:318)
~[io.netty.netty-3.9.0.Final.jar:na]
>     at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
~[io.netty.netty-3.9.0.Final.jar:na]
>     at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178) ~[io.netty.netty-3.9.0.Final.jar:na]
>     at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
~[io.netty.netty-3.9.0.Final.jar:na]
>     at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
~[io.netty.netty-3.9.0.Final.jar:na]
>     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
~[na:1.7.0_55]
>     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
~[na:1.7.0_55]
>     at java.lang.Thread.run(Thread.java:745) ~[na:1.7.0_55]
> Caused by: com.datastax.driver.core.exceptions.ReadTimeoutException: Cassandra timeout
during read query at consistency ONE (1 responses were required but only 0 replica responded)
>     at com.datastax.driver.core.Responses$Error$1.decode(Responses.java:61) ~[com.datastax.cassandra.cassandra-driver-core-2.1.3.jar:na]
>     at com.datastax.driver.core.Responses$Error$1.decode(Responses.java:38) ~[com.datastax.cassandra.cassandra-driver-core-2.1.3.jar:na]
>     at com.datastax.driver.core.Message$ProtocolDecoder.decode(Message.java:168) ~[com.datastax.cassandra.cassandra-driver-core-2.1.3.jar:na]
>     at org.jboss.netty.handler.codec.oneone.OneToOneDecoder.handleUpstream(OneToOneDecoder.java:66)
~[io.netty.netty-3.9.0.Final.jar:na]
>     ... 21 common frames omitted
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message