incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aaron morton <aa...@thelastpickle.com>
Subject Re: Cassandra 1.1.8 timeouts on clients
Date Fri, 08 Feb 2013 01:59:39 GMT
First check your node for IO errors. You have some bad data there. 

When you restart cassandra it may identify which sstables are corrupt. You can then stop the
node and remove them. 

You will then need to run repair to replace the missing data. 

Hope that helps. 

-----------------
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 7/02/2013, at 1:21 PM, Terry Cumaranatunge <cumarana@gmail.com> wrote:

> I may have found a trigger that is causing these problems. Anyone seen these compaction
problems in 1.1? I did run scrub on all my 1.0 data to convert it to 1.1 and fix level-manifest
problems before I started running 1.1.
> 
> 1 node:
> ERROR [CompactionExecutor:281] 2013-02-06 23:56:16,183 AbstractCassandraDaemon.java (line
135) Exception in thread Thread[Comp
> actionExecutor:281,1,main]
> java.io.IOError: org.apache.cassandra.db.ColumnSerializer$CorruptColumnException: invalid
column name length 0
>         at org.apache.cassandra.db.compaction.PrecompactedRow.merge(PrecompactedRow.java:116)
>         at org.apache.cassandra.db.compaction.PrecompactedRow.<init>(PrecompactedRow.java:99)
>         at org.apache.cassandra.db.compaction.CompactionController.getCompactedRow(CompactionController.java:176)
>         at org.apache.cassandra.db.compaction.CompactionIterable$Reducer.getReduced(CompactionIterable.java:83)
>         at org.apache.cassandra.db.compaction.CompactionIterable$Reducer.getReduced(CompactionIterable.java:68)
>         at org.apache.cassandra.utils.MergeIterator$ManyToOne.consume(MergeIterator.java:118)
>         at org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:101)
>         at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140)
>         at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135)
>         at com.google.common.collect.Iterators$7.computeNext(Iterators.java:614)
>         at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140)
>         at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135)
>         at org.apache.cassandra.db.compaction.CompactionTask.execute(CompactionTask.java:173)
>         at org.apache.cassandra.db.compaction.LeveledCompactionTask.execute(LeveledCompactionTask.java:50)
>         at org.apache.cassandra.db.compaction.CompactionManager$2.runMayThrow(CompactionManager.java:164)
>         at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
>         at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
>         at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
>         at java.util.concurrent.FutureTask.run(Unknown Source)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
>         at java.lang.Thread.run(Unknown Source)
> Caused by: org.apache.cassandra.db.ColumnSerializer$CorruptColumnException: invalid column
name length 0
>         at org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:98)
>         at org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:37)
>         at org.apache.cassandra.db.ColumnFamilySerializer.deserializeColumns(ColumnFamilySerializer.java:144)
>         at org.apache.cassandra.io.sstable.SSTableIdentityIterator.getColumnFamilyWithColumns(SSTableIdentityIterator.java:234
> )
>         at org.apache.cassandra.db.compaction.PrecompactedRow.merge(PrecompactedRow.java:112)
>         ... 21 more
> 
> 2nd node:
> ERROR [CompactionExecutor:266] 2013-02-06 23:51:35,181 AbstractCassandraDaemon.java (line
135) Exception in thread Thread[Comp
> actionExecutor:266,1,main]
> java.io.IOError: java.io.EOFException
>         at org.apache.cassandra.db.compaction.PrecompactedRow.merge(PrecompactedRow.java:116)
>         at org.apache.cassandra.db.compaction.PrecompactedRow.<init>(PrecompactedRow.java:99)
>         at org.apache.cassandra.db.compaction.CompactionController.getCompactedRow(CompactionController.java:176)
>         at org.apache.cassandra.db.compaction.CompactionIterable$Reducer.getReduced(CompactionIterable.java:83)
>         at org.apache.cassandra.db.compaction.CompactionIterable$Reducer.getReduced(CompactionIterable.java:68)
>         at org.apache.cassandra.utils.MergeIterator$ManyToOne.consume(MergeIterator.java:118)
>         at org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:101)
>         at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140)
>         at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135)
>         at com.google.common.collect.Iterators$7.computeNext(Iterators.java:614)
>         at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140)
>         at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135)
>         at org.apache.cassandra.db.compaction.CompactionTask.execute(CompactionTask.java:173)
>         at org.apache.cassandra.db.compaction.LeveledCompactionTask.execute(LeveledCompactionTask.java:50)
>         at org.apache.cassandra.db.compaction.CompactionManager$2.runMayThrow(CompactionManager.java:164)
>         at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
>         at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
>         at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
>         at java.util.concurrent.FutureTask.run(Unknown Source)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
>         at java.lang.Thread.run(Unknown Source)
> Caused by: java.io.EOFException
>         at java.io.RandomAccessFile.readFully(Unknown Source)
>         at java.io.RandomAccessFile.readFully(Unknown Source)
>         at org.apache.cassandra.utils.BytesReadTracker.readFully(BytesReadTracker.java:95)
>         at org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:401)
>         at org.apache.cassandra.utils.ByteBufferUtil.readWithLength(ByteBufferUtil.java:363)
>         at org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:120)
>         at org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:37)
>         at org.apache.cassandra.db.ColumnFamilySerializer.deserializeColumns(ColumnFamilySerializer.java:144)
> 
> On Wed, Feb 6, 2013 at 11:32 AM, Terry Cumaranatunge <cumarana@gmail.com> wrote:
> I've gotten timeouts on clients when using Cassandra 1.1.8 in a cluster of 12 nodes,
but I don't see the same behavior when using Cassandra 1.0.10. So, to do a controlled experiment,
the following was tried:
> 
> 1. Started with Cassandra 1.0.10. Built a database and ran our test tools against it
to build a database 
> 2. Ran workload to ensure no timeout problems were seen. Stopped the load 
> 3. Upgraded only 2 of the nodes in the cluster to 1.1.8. In the cluster of 12 nodes.
Ran scrub afterwards as document states to convert sstables to 1.1 format and to fix level-manifest
problems. 
> 4. Started load back up 
> 5. After some time, started seeing timeouts on the client for requests that go to the
1.1.8 nodes (i.e. requests sent to those nodes as the coordinator node)
> 
> There appears to be a pattern in these timeouts in that a large burst of them occur every
10 minutes (on the 10 minute boundary of the hour, like 10:10:XX, 10:20:YY, 10:30:ZZ etc.).
All clients see the timeouts from those two 1.1.8 nodes at the same exact time. The workload
is not I/O bound at this point and requests are not being dropped either based on tpstat output.
I don't see hinted handoff messages either as I believe that happens every 10 minutes. Key
cache size is set to 2.7GB and memtable size is 1/3 of heap (2.7GB). The key cache memory
usage is same as 1.0.10 based on heap size calculator. There are no GC pauses or any type
of heap pressure messages in the logs. This is with Java 1.6.0.38.
> 
> Does anyone know of some periodic tasks in Cassandra 1.1 that happens every 10 minutes
that could explain this problem or have any ideas?
> 
> Thanks
> 
> 


Mime
View raw message