cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Ellis (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (CASSANDRA-5158) Compaction fails after hours of frequent updates
Date Sat, 23 Mar 2013 23:13:15 GMT

     [ https://issues.apache.org/jira/browse/CASSANDRA-5158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jonathan Ellis resolved CASSANDRA-5158.
---------------------------------------

    Resolution: Cannot Reproduce
    
> Compaction fails after hours of frequent updates
> ------------------------------------------------
>
>                 Key: CASSANDRA-5158
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5158
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 1.1.8
>         Environment: Virtualized Ubuntu 12.04 (linux 2.6.18-274.3.1.el5), 
> SAN disk, 
> tested with Sun JRE 1.6.0_24 and 1.6.0_38-b05, 
> 8 Gb of RAM
>            Reporter: Tommi Koivula
>
> A data corruption occurs in one of our customer's environment after ~10 hours of running
with constant update-load (1-2 writes/sec). Compaction of one of the column families start
failing after about 10 hours and scrub is not able to fix it either. Writes are not producing
any errors and compaction of other column families works fine. We are not doing any reads
at this stage and always started from a fresh cassandra instance (no data) during startup.
After failure we have tried to restart Cassandra, which succeeds, and to retrieve all keys,
which fails with similar error.
> Cassandra configuration is the default except for directory locations. The number of
rows in the table is always less than 100k and doesn't increase as our application is purging
old data in hourly basis. The problematic table is very simple with only one column:
>    CREATE TABLE xxx ( KEY uuid PRIMARY KEY, value text) WITH gc_grace_seconds=0;
> We have tried enabling and disabling JNA without affect on this. JRE was also upgraded
to 1.6.0_38-b05 with no change.
> The problem reproduces every time at some point but we haven't being able to reproduce
it faster so it seems to require running several hours of updates. We had this problem with
Cassandra 1.1.0 and upgraded to 1.1.8 but the upgrade didn't fix it. The cluster has only
one node so the application is suffering a total data loss.
> Error message varies a bit in different runs, here is two produced by two different Cassandra
1.1.8 runs:
> [CompactionExecutor:15]|||Exception in thread Thread[CompactionExecutor:15,1,main] java.lang.NegativeArraySizeException:
null
>         at org.apache.cassandra.io.sstable.IndexHelper.skipBloomFilter(IndexHelper.java:57)
> 	at org.apache.cassandra.io.sstable.SSTableIdentityIterator.<init>(SSTableIdentityIterator.java:144)
>         at org.apache.cassandra.io.sstable.SSTableIdentityIterator.<init>(SSTableIdentityIterator.java:86)
>         at org.apache.cassandra.io.sstable.SSTableIdentityIterator.<init>(SSTableIdentityIterator.java:70)
>         at org.apache.cassandra.io.sstable.SSTableScanner$KeyScanningIterator.next(SSTableScanner.java:189)
>         at org.apache.cassandra.io.sstable.SSTableScanner$KeyScanningIterator.next(SSTableScanner.java:153)
>         at org.apache.cassandra.io.sstable.SSTableScanner.next(SSTableScanner.java:145)
>         at org.apache.cassandra.io.sstable.SSTableScanner.next(SSTableScanner.java:40)
>         at org.apache.cassandra.utils.MergeIterator$Candidate.advance(MergeIterator.java:149)
>         at org.apache.cassandra.utils.MergeIterator$ManyToOne.advance(MergeIterator.java:126)
>         at org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:100)
>         at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140)
>         at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135)
>         at com.google.common.collect.Iterators$7.computeNext(Iterators.java:614)
>         at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140)
>         at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135)
>         at org.apache.cassandra.db.compaction.CompactionTask.execute(CompactionTask.java:173)
>         at org.apache.cassandra.db.compaction.CompactionManager$2.runMayThrow(CompactionManager.java:164)
>         at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
>         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
>         at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662) [na:1.6.0_38]
> [CompactionExecutor:127]|org.apache.cassandra.utils.OutputHandler$LogOutput||Non-fatal
error reading row (stacktrace follows)
> java.io.IOError: java.io.EOFException: bloom filter claims to be -793390915
> bytes, longer than entire row size -3407588055136546636
>         at org.apache.cassandra.io.sstable.SSTableIdentityIterator.<init>(SSTableIdentityIterator.java:156)
>         at org.apache.cassandra.io.sstable.SSTableIdentityIterator.<init>(SSTableIdentityIterator.java:86)
>         at org.apache.cassandra.db.compaction.Scrubber.scrub(Scrubber.java:170)
>         at org.apache.cassandra.db.compaction.CompactionManager.scrubOne(CompactionManager.java:496)
>         at org.apache.cassandra.db.compaction.CompactionManager.doScrub(CompactionManager.java:485)
>         at org.apache.cassandra.db.compaction.CompactionManager.access$300(CompactionManager.java:69)
>         at org.apache.cassandra.db.compaction.CompactionManager$4.perform(CompactionManager.java:235)
>         at org.apache.cassandra.db.compaction.CompactionManager$3.call(CompactionManager.java:205)
>         at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662) [na:1.6.0_38]
> Caused by: java.io.EOFException: bloom filter claims to be -793390915 bytes,
> longer than entire row size -3407588055136546636
>         at org.apache.cassandra.io.sstable.IndexHelper.defreezeBloomFilter(IndexHelper.java:129)
>         at org.apache.cassandra.io.sstable.SSTableIdentityIterator.<init>(SSTableIdentityIterator.java:121)
>         ... 12 common frames omitted
> Data directory for the table looks like this:
>  $ ls -l data/p/m
> total 100644
> -rw-r--r-- 1 user group     3342 Jan 14 14:08 p-m_nodes-hf-1-CompressionInfo.db
> -rw-r--r-- 1 user group  8809019 Jan 14 14:08 p-m_nodes-hf-1-Data.db
> -rw-r--r-- 1 user group   155632 Jan 14 14:08 p-m_nodes-hf-1-Filter.db
> -rw-r--r-- 1 user group 11441829 Jan 14 14:08 p-m_nodes-hf-1-Index.db
> -rw-r--r-- 1 user group     4340 Jan 14 14:08 p-m_nodes-hf-1-Statistics.db
> -rw-r--r-- 1 user group     3310 Jan 14 17:06 p-m_nodes-hf-2-CompressionInfo.db
> -rw-r--r-- 1 user group  8948738 Jan 14 17:06 p-m_nodes-hf-2-Data.db
> -rw-r--r-- 1 user group   138992 Jan 14 17:06 p-m_nodes-hf-2-Filter.db
> -rw-r--r-- 1 user group 11531149 Jan 14 17:06 p-m_nodes-hf-2-Index.db
> -rw-r--r-- 1 user group     4340 Jan 14 17:06 p-m_nodes-hf-2-Statistics.db
> -rw-r--r-- 1 user group     3270 Jan 14 21:19 p-m_nodes-hf-3-CompressionInfo.db
> -rw-r--r-- 1 user group  9246229 Jan 14 21:19 p-m_nodes-hf-3-Data.db
> -rw-r--r-- 1 user group   119776 Jan 14 21:19 p-m_nodes-hf-3-Filter.db
> -rw-r--r-- 1 user group 11633474 Jan 14 21:19 p-m_nodes-hf-3-Index.db
> -rw-r--r-- 1 user group     4340 Jan 14 21:19 p-m_nodes-hf-3-Statistics.db
> -rw-r--r-- 1 user group     3350 Jan 15 06:35 p-m_nodes-hf-4-CompressionInfo.db
> -rw-r--r-- 1 user group  8766723 Jan 15 06:35 p-m_nodes-hf-4-Data.db
> -rw-r--r-- 1 user group   158792 Jan 15 06:35 p-m_nodes-hf-4-Filter.db
> -rw-r--r-- 1 user group 11425708 Jan 15 06:35 p-m_nodes-hf-4-Index.db
> -rw-r--r-- 1 user group     4340 Jan 15 06:35 p-m_nodes-hf-4-Statistics.db
> -rw-r--r-- 1 user group     3318 Jan 15 10:27 p-m_nodes-hf-6-CompressionInfo.db
> -rw-r--r-- 1 user group  8748453 Jan 15 10:27 p-m_nodes-hf-6-Data.db
> -rw-r--r-- 1 user group   141904 Jan 15 10:27 p-m_nodes-hf-6-Filter.db
> -rw-r--r-- 1 user group 11518264 Jan 15 10:27 p-m_nodes-hf-6-Index.db
> -rw-r--r-- 1 user group     4340 Jan 15 10:27 p-m_nodes-hf-6-Statistics.db
> Commit log directory:
> $ ls -l commitlog/
> total 164012
> -rw-r--r-- 1 user group 33554432 Jan 14 12:44 CommitLog-1358164638228.log
> -rw-r--r-- 1 user group 33554432 Jan 15 10:50 CommitLog-1358164638237.log
> -rw-r--r-- 1 user group 33554432 Jan 15 11:59 CommitLog-1358164638238.log
> -rw-r--r-- 1 user group 33554432 Jan 15 10:27 CommitLog-1358164638239.log
> -rw-r--r-- 1 user group 33554432 Jan 15 11:48 CommitLog-1358164638240.log

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message