cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "J.B. Langston (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (CASSANDRA-7386) JBOD threshold to prevent unbalanced disk utilization
Date Wed, 12 Nov 2014 15:31:36 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-7386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14208119#comment-14208119
] 

J.B. Langston edited comment on CASSANDRA-7386 at 11/12/14 3:31 PM:
--------------------------------------------------------------------

I've seen a lot of users hitting this issue lately, so the sooner we can get a patch the better.
This also needs to be back ported to 2.0 if at all possible.  In several cases I've seen severe
imbalances like the ones described where there are some drives completely full and others
at 10-20% utilization.

Here are a couple of stack traces. It happens both during flushes and compactions.

{code}
ERROR [FlushWriter:6241] 2014-09-07 08:27:35,298 CassandraDaemon.java (line 198) Exception
in thread Thread[FlushWriter:6241,5,main]
FSWriteError in /data6/system/compactions_in_progress/system-compactions_in_progress-tmp-jb-8222-Index.db
	at org.apache.cassandra.io.util.SequentialWriter.flushData(SequentialWriter.java:267)
	at org.apache.cassandra.io.util.SequentialWriter.flushInternal(SequentialWriter.java:219)
	at org.apache.cassandra.io.util.SequentialWriter.syncInternal(SequentialWriter.java:191)
	at org.apache.cassandra.io.util.SequentialWriter.close(SequentialWriter.java:381)
	at org.apache.cassandra.io.sstable.SSTableWriter$IndexWriter.close(SSTableWriter.java:481)
	at org.apache.cassandra.io.util.FileUtils.closeQuietly(FileUtils.java:212)
	at org.apache.cassandra.io.sstable.SSTableWriter.abort(SSTableWriter.java:301)
	at org.apache.cassandra.db.Memtable$FlushRunnable.writeSortedContents(Memtable.java:417)
	at org.apache.cassandra.db.Memtable$FlushRunnable.runWith(Memtable.java:350)
	at org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48)
	at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:744)
Caused by: java.io.IOException: No space left on device
	at java.io.RandomAccessFile.writeBytes0(Native Method)
	at java.io.RandomAccessFile.writeBytes(RandomAccessFile.java:520)
	at java.io.RandomAccessFile.write(RandomAccessFile.java:550)
	at org.apache.cassandra.io.util.SequentialWriter.flushData(SequentialWriter.java:263)
	... 13 more

ERROR [CompactionExecutor:9166] 2014-09-06 16:09:14,786 CassandraDaemon.java (line 198) Exception
in thread Thread[CompactionExecutor:9166,1,main]
FSWriteError in /data6/keyspace_1/data/keyspace_1-data-tmp-jb-13599-Filter.db
	at org.apache.cassandra.io.sstable.SSTableWriter$IndexWriter.close(SSTableWriter.java:475)
	at org.apache.cassandra.io.util.FileUtils.closeQuietly(FileUtils.java:212)
	at org.apache.cassandra.io.sstable.SSTableWriter.abort(SSTableWriter.java:301)
	at org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:209)
	at org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48)
	at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
	at org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:60)
	at org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:59)
	at org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionTask.run(CompactionManager.java:197)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:744)
Caused by: java.io.IOException: No space left on device
	at java.io.FileOutputStream.write(Native Method)
	at java.io.FileOutputStream.write(FileOutputStream.java:295)
	at java.io.DataOutputStream.writeInt(DataOutputStream.java:197)
	at org.apache.cassandra.utils.BloomFilterSerializer.serialize(BloomFilterSerializer.java:34)
	at org.apache.cassandra.utils.Murmur3BloomFilter$Murmur3BloomFilterSerializer.serialize(Murmur3BloomFilter.java:44)
	at org.apache.cassandra.utils.FilterFactory.serialize(FilterFactory.java:41)
	at org.apache.cassandra.io.sstable.SSTableWriter$IndexWriter.close(SSTableWriter.java:468)
	... 13 more
{code}


was (Author: jblangston@datastax.com):
I've seen a lot of users hitting this issue lately, so the sooner we can get a patch the better.
This also needs to be back ported to 2.0 if at all possible.  In several cases I've seen severe
imbalances like the ones described where there are some drives completely full and others
at 10-20% utilization.

> JBOD threshold to prevent unbalanced disk utilization
> -----------------------------------------------------
>
>                 Key: CASSANDRA-7386
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7386
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Chris Lohfink
>            Assignee: Alan Boudreault
>            Priority: Minor
>             Fix For: 2.1.3
>
>         Attachments: 7386-v1.patch, 7386v2.diff, Mappe1.ods, mean-writevalue-7disks.png,
patch_2_1_branch_proto.diff, sstable-count-second-run.png
>
>
> Currently the pick the disks are picked first by number of current tasks, then by free
space.  This helps with performance but can lead to large differences in utilization in some
(unlikely but possible) scenarios.  Ive seen 55% to 10% and heard reports of 90% to 10% on
IRC.  With both LCS and STCS (although my suspicion is that STCS makes it worse since harder
to be balanced).
> I purpose the algorithm change a little to have some maximum range of utilization where
it will pick by free space over load (acknowledging it can be slower).  So if a disk A is
30% full and disk B is 5% full it will never pick A over B until it balances out.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message