cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jason Wee <peich...@gmail.com>
Subject Re: Experiencing Timeouts on one node
Date Tue, 07 Jul 2015 06:45:43 GMT
3. How do we rebuild System keyspace?

wipe this node and start it all over.

hth

jason

On Tue, Jul 7, 2015 at 12:16 AM, Shashi Yachavaram <shashi007@gmail.com>
wrote:

> When we reboot the problematic node, we see the following errors in
> system.log.
>
> 1. Does this mean hints column family is corrupted?
> 2. Can we scrub system column family on problematic node and its
> replication partners?
> 3. How do we rebuild System keyspace?
>
> ==================================================================
> ERROR [CompactionExecutor:950] 2015-06-27 20:11:44,595
> CassandraDaemon.java (line 191) Exception in thread
> Thread[CompactionExecutor:950,1,main]
> java.lang.AssertionError: originally calculated column size of 8684 but
> now it is 15725
> at
> org.apache.cassandra.db.compaction.LazilyCompactedRow.write(LazilyCompactedRow.java:135)
> at
> org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:160)
> at
> org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:162)
> at
> org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48)
> at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
> at
> org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:58)
> at
> org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:60)
> at
> org.apache.cassandra.db.compaction.CompactionManager$7.runMayThrow(CompactionManager.java:442)
> at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
> at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
> at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
> at java.util.concurrent.FutureTask.run(Unknown Source)
> at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
> at java.lang.Thread.run(Unknown Source)
> ERROR [HintedHandoff:552] 2015-06-27 20:11:44,595 CassandraDaemon.java
> (line 191) Exception in thread Thread[HintedHandoff:552,1,main]
> java.lang.RuntimeException: java.util.concurrent.ExecutionException:
> java.lang.AssertionError: originally calculated column size of 8684 but now
> it is 15725
> at
> org.apache.cassandra.db.HintedHandOffManager.doDeliverHintsToEndpoint(HintedHandOffManager.java:436)
> at
> org.apache.cassandra.db.HintedHandOffManager.deliverHintsToEndpoint(HintedHandOffManager.java:282)
> at
> org.apache.cassandra.db.HintedHandOffManager.access$300(HintedHandOffManager.java:90)
> at
> org.apache.cassandra.db.HintedHandOffManager$4.run(HintedHandOffManager.java:502)
> at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
> at java.lang.Thread.run(Unknown Source)
> Caused by: java.util.concurrent.ExecutionException:
> java.lang.AssertionError: originally calculated column size of 8684 but now
> it is 15725
> at java.util.concurrent.FutureTask$Sync.innerGet(Unknown Source)
> at java.util.concurrent.FutureTask.get(Unknown Source)
> at
> org.apache.cassandra.db.HintedHandOffManager.doDeliverHintsToEndpoint(HintedHandOffManager.java:432)
> ... 6 more
> Caused by: java.lang.AssertionError: originally calculated column size of
> 8684 but now it is 15725
> at
> org.apache.cassandra.db.compaction.LazilyCompactedRow.write(LazilyCompactedRow.java:135)
> at
> org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:160)
> at
> org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:162)
> at
> org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48)
> at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
> at
> org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:58)
> at
> org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:60)
> at
> org.apache.cassandra.db.compaction.CompactionManager$7.runMayThrow(CompactionManager.java:442)
> at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
> at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
> at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
> at java.util.concurrent.FutureTask.run(Unknown Source)
> ==================================================================
>
>
> On Wed, Jul 1, 2015 at 11:59 AM, Shashi Yachavaram <shashi007@gmail.com>
> wrote:
>
>> We have a 28 node cluster, out of which only one node is experiencing
>> timeouts.
>> We thought it was the raid, but there are two other nodes on the same
>> raid without
>> any problem. Also The problem goes away if we reboot the node, and then
>> reappears
>> after seven  days. The following hinted hand-off timeouts are seen on the
>> node
>> experiencing the timeouts. Also we did not notice any gossip errors.
>>
>> I was wondering if anyone has seen this issue and how they resolved it.
>>
>> Cassandra Version: 1.2.15.1
>> OS: Linux cm 2.6.32-504.8.1.el6.x86_64 #1 SMP Fri Dec 19 12:09:25 EST
>> 2014 x86_64 x86_64 x86_64 GNU/Linux
>> java version "1.6.0_85"
>>
>>
>> ------------------------------------------------------------------------------------------------------------------------------------
>> INFO [HintedHandoff:2] 2015-06-17 22:52:08,130 HintedHandOffManager.java
>> (line 296) Started hinted handoff for host:
>> 4fe86051-6bca-4c28-b09c-1b0f073c1588 with IP: /192.168.1.122
>>  INFO [HintedHandoff:1] 2015-06-17 22:52:08,131 HintedHandOffManager.java
>> (line 296) Started hinted handoff for host:
>> bbf0878b-b405-4518-b649-f6cf7c9a6550 with IP: /192.168.1.119
>>  INFO [HintedHandoff:2] 2015-06-17 22:52:17,634 HintedHandOffManager.java
>> (line 422) Timed out replaying hints to /192.168.1.122; aborting (0
>> delivered)
>>  INFO [HintedHandoff:2] 2015-06-17 22:52:17,635 HintedHandOffManager.java
>> (line 296) Started hinted handoff for host:
>> f7b7ab10-4d42-4f0c-af92-2934a075bee3 with IP: /192.168.1.108
>>  INFO [HintedHandoff:1] 2015-06-17 22:52:17,643 HintedHandOffManager.java
>> (line 422) Timed out replaying hints to /192.168.1.119; aborting (0
>> delivered)
>>  INFO [HintedHandoff:1] 2015-06-17 22:52:17,643 HintedHandOffManager.java
>> (line 296) Started hinted handoff for host:
>> ddb79f35-3e2b-4be8-84d8-7942086e2b73 with IP: /192.168.1.104
>>  INFO [HintedHandoff:2] 2015-06-17 22:52:27,143 HintedHandOffManager.java
>> (line 422) Timed out replaying hints to /192.168.1.108; aborting (0
>> delivered)
>>  INFO [HintedHandoff:2] 2015-06-17 22:52:27,144 HintedHandOffManager.java
>> (line 296) Started hinted handoff for host:
>> 6a2fa431-4a51-44cb-af19-1991c960e075 with IP: /192.168.1.117
>>  INFO [HintedHandoff:1] 2015-06-17 22:52:27,153 HintedHandOffManager.java
>> (line 422) Timed out replaying hints to /192.168.1.104; aborting (0
>> delivered)
>>  INFO [HintedHandoff:1] 2015-06-17 22:52:27,154 HintedHandOffManager.java
>> (line 296) Started hinted handoff for host:
>> cf03174a-533c-44d6-a679-e70090ad2bc5 with IP: /192.168.1.107
>>
>> ------------------------------------------------------------------------------------------------------------------------------------
>>
>> Thanks
>> -shashi..
>>
>
>

Mime
View raw message