Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 72AA110CF1 for ; Sun, 4 May 2014 09:11:30 +0000 (UTC) Received: (qmail 83869 invoked by uid 500); 4 May 2014 09:11:27 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 83771 invoked by uid 500); 4 May 2014 09:11:26 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 83662 invoked by uid 99); 4 May 2014 09:11:25 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 04 May 2014 09:11:25 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of blueflycn@gmail.com designates 209.85.219.48 as permitted sender) Received: from [209.85.219.48] (HELO mail-oa0-f48.google.com) (209.85.219.48) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 04 May 2014 09:11:20 +0000 Received: by mail-oa0-f48.google.com with SMTP id i4so6465493oah.35 for ; Sun, 04 May 2014 02:10:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=juDZQZ9eOLitnZr9/XZiaiLZ/UewV+NbyMh/LVjovrY=; b=Mrvts+G2O7y+Pk5Bb6wrQThYonG9xg6TqXB20IiLIIDyCyGWvO/5Rx72//QcqiCh/Y +biNh31+nN/TGu9l2YlCM5Wjxgcps7ajoTPYMEyeeGG+qT++7gJN0F7fSjBLd7GXkvyf P30OzEGMoeeTvs9J+xuw8Qb/g5Vli3i1sOp6792U982FbFc99tmuYMAPqonD5ifXIFom oE59RuL2GpZm0z93FhF6e81e036agaj16MpgTU1WUjO1SFrdKgIF2+mgNZR3tnwxGj/s EzVnFWWmaqCjlbg5fSL55eo6SjjeMDtrYfkdXxrnMTOB2vYsyXLVSEiojy82bP94+pQX 09+g== MIME-Version: 1.0 X-Received: by 10.182.118.169 with SMTP id kn9mr25463221obb.46.1399194656837; Sun, 04 May 2014 02:10:56 -0700 (PDT) Received: by 10.182.24.201 with HTTP; Sun, 4 May 2014 02:10:56 -0700 (PDT) In-Reply-To: References: Date: Sun, 4 May 2014 17:10:56 +0800 Message-ID: Subject: Re: Cassandra 2.0.7 keeps reporting errors due to no space left on device From: Yatong Zhang To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=089e0149cdc87e181904f88f64fb X-Virus-Checked: Checked by ClamAV on apache.org --089e0149cdc87e181904f88f64fb Content-Type: text/plain; charset=UTF-8 I am using the latest 2.0.7. The 'nodetool tpstats' shows as: [root@storage5 bin]# ./nodetool tpstats > Pool Name Active Pending Completed Blocked > All time blocked > ReadStage 0 0 628220 > 0 0 > RequestResponseStage 0 0 3342234 > 0 0 > MutationStage 0 0 3172116 > 0 0 > ReadRepairStage 0 0 47666 > 0 0 > ReplicateOnWriteStage 0 0 0 > 0 0 > GossipStage 0 0 756024 > 0 0 > AntiEntropyStage 0 0 0 > 0 0 > MigrationStage 0 0 0 > 0 0 > MemoryMeter 0 0 6652 > 0 0 > MemtablePostFlusher 0 0 7042 > 0 0 > FlushWriter 0 0 4023 > 0 0 > MiscStage 0 0 0 > 0 0 > PendingRangeCalculator 0 0 27 > 0 0 > commitlog_archiver 0 0 0 > 0 0 > InternalResponseStage 0 0 0 > 0 0 > HintedHandoff 0 0 28 > 0 0 > > Message type Dropped > RANGE_SLICE 0 > READ_REPAIR 0 > PAGED_RANGE 0 > BINARY 0 > READ 0 > MUTATION 0 > _TRACE 0 > REQUEST_RESPONSE 0 > COUNTER_MUTATION 0 > And here is another type of error, and these errors seem to occur after 'disk is full' ERROR [SSTableBatchOpen:2] 2014-04-30 13:47:48,348 CassandraDaemon.java > (line 198) Exception in thread Thread[SSTableBatchOpen:2,5,main] > org.apache.cassandra.io.sstable.CorruptSSTableException: > java.io.EOFException > at > org.apache.cassandra.io.compress.CompressionMetadata.(CompressionMetadata.java:110) > at > org.apache.cassandra.io.compress.CompressionMetadata.create(CompressionMetadata.java:64) > at > org.apache.cassandra.io.util.CompressedPoolingSegmentedFile$Builder.complete(CompressedPoolingSegmentedFile.java:42) > at > org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:458) > at > org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:422) > at > org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:203) > at > org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:184) > at > org.apache.cassandra.io.sstable.SSTableReader$1.run(SSTableReader.java:264) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > Caused by: java.io.EOFException > at > java.io.DataInputStream.readUnsignedShort(DataInputStream.java:340) > at java.io.DataInputStream.readUTF(DataInputStream.java:589) > at java.io.DataInputStream.readUTF(DataInputStream.java:564) > at > org.apache.cassandra.io.compress.CompressionMetadata.(CompressionMetadata.java:85) > ... 12 more > On Sun, May 4, 2014 at 4:59 PM, DuyHai Doan wrote: > The symptoms looks like there are pending compactions stacking up or > failed compactions so temporary files (-tmp-Data.db) are not properly > cleaned up. > > What is your Cassandra version ? Can you do a "nodetool tpstats" and look > into Cassandra logs to see whether there are issues with compactions ? > > I've found one discussion thread that have the same symptoms: > http://comments.gmane.org/gmane.comp.db.cassandra.user/22089 > > > > > On Sun, May 4, 2014 at 10:39 AM, Yatong Zhang wrote: > >> Yes after a while the disk fills up again. So I changed the compaction >> strategy from 'sized tiered' to 'leveled' to reduce the disk usage when >> compacting, but the problem still occurs. >> >> This table has lots of write and a relative very small read, and no >> update. here is the schema of the table: >> >> CREATE TABLE mydb.images ( >> image_id uuid PRIMARY KEY, >> available boolean, >> message text, >> raw_data blob, >> time_created timestamp, >> url text >> ) WITH >> bloom_filter_fp_chance=0.010000 AND >> caching='KEYS_ONLY' AND >> comment='' AND >> dclocal_read_repair_chance=0.000000 AND >> gc_grace_seconds=864000 AND >> read_repair_chance=0.100000 AND >> replicate_on_write='true' AND >> populate_io_cache_on_flush='false' AND >> compaction={'sstable_size_in_mb': '192', 'class': >> 'LeveledCompactionStrategy'} AND >> compression={'sstable_compression': 'LZ4Compressor'}; >> >> >> On Sun, May 4, 2014 at 4:31 PM, DuyHai Doan wrote: >> >>> And after a while the /data6 drive fills up again right ? >>> >>> One question, can you please give the CQL3 definition of your "mydb-images-tmp" >>> table ? >>> >>> What is the access pattern for this table ? Lots of write ? Lots of >>> update ? >>> >>> >>> >>> >>> On Sun, May 4, 2014 at 10:00 AM, Yatong Zhang wrote: >>> >>>> after restarting or 'cleanup' the big tmp file has gone and all looks >>>> like fine: >>>> >>>> -rw-r--r-- 1 root root 19K Apr 30 13:58 >>>>> mydb_oe-images-tmp-jb-96242-CompressionInfo.db >>>>> -rw-r--r-- 1 root root 145M Apr 30 13:58 >>>>> mydb_oe-images-tmp-jb-96242-Data.db >>>>> -rw-r--r-- 1 root root 64K Apr 30 13:58 >>>>> mydb_oe-images-tmp-jb-96242-Index.db >>>>> >>>> >>>> [root@node5 images]# df -hl >>>> Filesystem Size Used Avail Use% Mounted on >>>> /dev/sda1 49G 7.5G 39G 17% / >>>> tmpfs 7.8G 0 7.8G 0% /dev/shm >>>> /dev/sda3 3.6T 1.3T 2.1T 38% /data1 >>>> /dev/sdb1 3.6T 1.4T 2.1T 39% /data2 >>>> /dev/sdc1 3.6T 466G 3.0T 14% /data3 >>>> /dev/sdd1 3.6T 1.3T 2.2T 38% /data4 >>>> /dev/sde1 3.6T 1.3T 2.2T 38% /data5 >>>> /dev/sdf1 3.6T 662M 3.4T 1% /data6 >>>> >>>> I didn't perform repair, not even for one time >>>> >>>> >>>> On Sun, May 4, 2014 at 2:37 PM, DuyHai Doan wrote: >>>> >>>>> Hello Yatong >>>>> >>>>> "If I restart the node or using 'cleanup', it will resume to normal." >>>>> --> what does df -hl shows for /data6 when you restart or cleanup the node ? >>>>> >>>>> By the way, a single SSTable of 3.6Tb is kind of huge. Do you perform >>>>> manual repair frequently ? >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> On Sun, May 4, 2014 at 1:51 AM, Yatong Zhang wrote: >>>>> >>>>>> My Cassandra cluster has plenty of free space, for now only about 30% >>>>>> of space are used >>>>>> >>>>>> >>>>>> On Sun, May 4, 2014 at 6:36 AM, Yatong Zhang wrote: >>>>>> >>>>>>> Hi there, >>>>>>> >>>>>>> It was strange that the 'xxx-tmp-xxx.db' file kept increasing until >>>>>>> Cassandra throw exceptions with 'No space left on device'. I am using CQL 3 >>>>>>> to create a table to store data about 200K ~ 500K per record. I have 6 >>>>>>> harddisks per node and cassandra was configured with 6 data >>>>>>> directories(ext4 file systems, Centos 6.5): >>>>>>> >>>>>>> data_file_directories: >>>>>>>> - /data1/cass >>>>>>>> - /data2/cass >>>>>>>> - /data3/cass >>>>>>>> - /data4/cass >>>>>>>> - /data5/cass >>>>>>>> - /data6/cass >>>>>>>> >>>>>>> >>>>>>> And every directory is on a standalone disk. But I just found when >>>>>>> the error occurred: >>>>>>> >>>>>>> [root@node5 images]# ll -hl >>>>>>>> total 3.6T >>>>>>>> drwxr-xr-x 4 root root 4.0K Jan 20 09:44 snapshots >>>>>>>> -rw-r--r-- 1 root root 456M Apr 30 13:42 >>>>>>>> mydb-images-tmp-jb-91068-CompressionInfo.db >>>>>>>> -rw-r--r-- 1 root root 3.5T Apr 30 13:42 >>>>>>>> mydb-images-tmp-jb-91068-Data.db >>>>>>>> -rw-r--r-- 1 root root 0 Apr 30 13:42 >>>>>>>> mydb-images-tmp-jb-91068-Filter.db >>>>>>>> -rw-r--r-- 1 root root 2.0G Apr 30 13:42 >>>>>>>> mydb-images-tmp-jb-91068-Index.db >>>>>>>> >>>>>>> >>>>>>> [root@node5 images]# df -hl >>>>>>> Filesystem Size Used Avail Use% Mounted on >>>>>>> /dev/sda1 49G 7.5G 39G 17% / >>>>>>> tmpfs 7.8G 0 7.8G 0% /dev/shm >>>>>>> /dev/sda3 3.6T 1.3T 2.1T 38% /data1 >>>>>>> /dev/sdb1 3.6T 1.4T 2.1T 39% /data2 >>>>>>> /dev/sdc1 3.6T 466G 3.0T 14% /data3 >>>>>>> /dev/sdd1 3.6T 1.3T 2.2T 38% /data4 >>>>>>> /dev/sde1 3.6T 1.3T 2.2T 38% /data5 >>>>>>> /dev/sdf1 3.6T 3.6T 0 100% /data6 >>>>>>> >>>>>>> *mydb-images-tmp-jb-91068-Data.db *almost occupied all the disk >>>>>>> space (4T harddisk with 3.6T actual usable size) and the error looks like: >>>>>>> >>>>>>> INFO [FlushWriter:4174] 2014-05-04 05:15:15,744 Memtable.java (line >>>>>>>> 403) Completed flushing >>>>>>>> /data3/cass/system/compactions_in_progress/system-compactions_in_progress-jb-16942-Data.db >>>>>>>> (42 bytes) for commitlog position ReplayPosition(segmentId=1398900356204, >>>>>>>> position=25024609) >>>>>>>> INFO [CompactionExecutor:3689] 2014-05-04 05:15:15,745 >>>>>>>> CompactionTask.java (line 115) Compacting >>>>>>>> [SSTableReader(path='/data3/cass/system/compactions_in_progress/system-compactions_in_progress-jb-16940-Data.db'), >>>>>>>> SSTableReader(path='/data3/cass/system/compactions_in_progress/system-compactions_in_progress-jb-16942-Data.db'), >>>>>>>> SSTableReader(path='/data3/cass/system/compactions_in_progress/system-compactions_in_progress-jb-16941-Data.db'), >>>>>>>> SSTableReader(path='/data3/cass/system/compactions_in_progress/system-compactions_in_progress-jb-16939-Data.db')] >>>>>>>> ERROR [CompactionExecutor:1245] 2014-05-04 05:15:15,745 >>>>>>>> CassandraDaemon.java (line 198) Exception in thread >>>>>>>> Thread[CompactionExecutor:1245,1,main] >>>>>>>> FSWriteError in >>>>>>>> /data2/cass/mydb/images/mydb-images-tmp-jb-92181-Filter.db >>>>>>>> at >>>>>>>> org.apache.cassandra.io.sstable.SSTableWriter$IndexWriter.close(SSTableWriter.java:475) >>>>>>>> at >>>>>>>> org.apache.cassandra.io.util.FileUtils.closeQuietly(FileUtils.java:212) >>>>>>>> at >>>>>>>> org.apache.cassandra.io.sstable.SSTableWriter.abort(SSTableWriter.java:301) >>>>>>>> at >>>>>>>> org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:209) >>>>>>>> at >>>>>>>> org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48) >>>>>>>> at >>>>>>>> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) >>>>>>>> at >>>>>>>> org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:60) >>>>>>>> at >>>>>>>> org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:59) >>>>>>>> at >>>>>>>> org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionTask.run(CompactionManager.java:197) >>>>>>>> at >>>>>>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) >>>>>>>> at java.util.concurrent.FutureTask.run(FutureTask.java:262) >>>>>>>> at >>>>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >>>>>>>> at >>>>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >>>>>>>> at java.lang.Thread.run(Thread.java:744) >>>>>>>> Caused by: java.io.IOException: No space left on device >>>>>>>> at java.io.FileOutputStream.write(Native Method) >>>>>>>> at java.io.FileOutputStream.write(FileOutputStream.java:295) >>>>>>>> at >>>>>>>> java.io.DataOutputStream.writeInt(DataOutputStream.java:197) >>>>>>>> at >>>>>>>> org.apache.cassandra.utils.BloomFilterSerializer.serialize(BloomFilterSerializer.java:34) >>>>>>>> at >>>>>>>> org.apache.cassandra.utils.Murmur3BloomFilter$Murmur3BloomFilterSerializer.serialize(Murmur3BloomFilter.java:44) >>>>>>>> at >>>>>>>> org.apache.cassandra.utils.FilterFactory.serialize(FilterFactory.java:41) >>>>>>>> at >>>>>>>> org.apache.cassandra.io.sstable.SSTableWriter$IndexWriter.close(SSTableWriter.java:468) >>>>>>>> ... 13 more >>>>>>>> ERROR [CompactionExecutor:1245] 2014-05-04 05:15:15,800 >>>>>>>> StorageService.java (line 367) Stopping gossiper >>>>>>>> WARN [CompactionExecutor:1245] 2014-05-04 05:15:15,800 >>>>>>>> StorageService.java (line 281) Stopping gossip by operator request >>>>>>>> INFO [CompactionExecutor:1245] 2014-05-04 05:15:15,800 >>>>>>>> Gossiper.java (line 1271) Announcing shutdown >>>>>>>> >>>>>>> >>>>>>> >>>>>>> I have changed my table to "LeveledCompactionStrategy" to reduce the >>>>>>> disk size needed when compaction, with: >>>>>>> >>>>>>> ALTER TABLE images WITH compaction = { 'class' : >>>>>>>> 'LeveledCompactionStrategy', 'sstable_size_in_mb' : '192' }; >>>>>>>> >>>>>>> >>>>>>> But the problem still exists: the file keep increasing, and after >>>>>>> about 2 or 3 days cassandra will fail due to 'No space left on device' >>>>>>> error. If I restart the node or using 'cleanup', it will resume to normal. >>>>>>> >>>>>>> I don't know is it because my configuration or it's just a bug, so >>>>>>> would any one please help to solve this issue? >>>>>>> >>>>>>> Thanks >>>>>>> >>>>>> >>>>>> >>>>> >>>> >>> >> > --089e0149cdc87e181904f88f64fb Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
I am using the latest 2.0.7. The 'nodetool tpstats'= ; shows as:

[root@storage5 bin]# ./nodetool= tpstats
Pool Name=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 Active=C2=A0=C2=A0 Pending= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 Completed=C2=A0=C2=A0 Blocked=C2=A0 All time= blocked
ReadStage=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0 0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 628220=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0 0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 0
RequestResponseStage= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0 0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0 3342234=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0 0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 0
MutationStage=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0 3172116=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0 0
ReadRepairStage=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0 47666=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0 0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0 0
ReplicateOnWriteStage=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0 0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0 0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 0
GossipStage=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0 756024=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0 0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0 0
AntiEntropyStage=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0 0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 0
Migration= Stage=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0 0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0 0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0 0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0 0
MemoryMeter=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 6652=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0 0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 0
MemtablePostFlusher=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 7042=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0 0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 0
FlushWriter=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0 0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0 4023=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0 0
MiscStage=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0 0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 0
PendingRa= ngeCalculator=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0 0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 27=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 0
commitlog_ar= chiver=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0 0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0 0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0 0
InternalResponseStage=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0 0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= 0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= 0
HintedHandoff=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 28=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0 0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 0

Message type=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 Dropped
RANGE_SLICE=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 0
READ_REPAIR=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0 0
PAGED_RANGE=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 0
BINARY=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 0
READ=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 0
MUTATION=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 0
_TRACE=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 0
REQUEST_RESPONSE=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0 0
COUNTER_MUTATION=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 0


=C2=A0And here is another type of error, and these errors seem to oc= cur after 'disk is full'

ERROR [SSTableBatchOpen:2] 2014-04-30 13:47:48,348 CassandraDaemon.java (li= ne 198) Exception in thread Thread[SSTableBatchOpen:2,5,main]
org.apache= .cassandra.io.sstable.CorruptSSTableException: java.io.EOFException
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 at org.apache.cassandra.io.compr= ess.CompressionMetadata.<init>(CompressionMetadata.java:110)
=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 at org.apache.cassandra.io.compress= .CompressionMetadata.create(CompressionMetadata.java:64)
=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0 at org.apache.cassandra.io.util.CompressedPooli= ngSegmentedFile$Builder.complete(CompressedPoolingSegmentedFile.java:42) =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 at org.apache.cassandra.io.sstab= le.SSTableReader.load(SSTableReader.java:458)
=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0 at org.apache.cassandra.io.sstable.SSTableReader.load(SS= TableReader.java:422)
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 at org.= apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:203)
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 at org.apache.cassandra.io.sstab= le.SSTableReader.open(SSTableReader.java:184)
=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0 at org.apache.cassandra.io.sstable.SSTableReader$1.run(S= STableReader.java:264)
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 at jav= a.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 at java.util.concurrent.FutureTa= sk.run(FutureTask.java:262)
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 a= t java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java= :1145)
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 at java.util.concurren= t.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 at java.lang.Thread.run(Thread.j= ava:744)
Caused by: java.io.EOFException
=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0 at java.io.DataInputStream.readUnsignedShort(DataInputStrea= m.java:340)
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 at java.io.DataIn= putStream.readUTF(DataInputStream.java:589)
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 at java.io.DataInputStream.readU= TF(DataInputStream.java:564)
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 = at org.apache.cassandra.io.compress.CompressionMetadata.<init>(Compre= ssionMetadata.java:85)
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 ... 12= more



O= n Sun, May 4, 2014 at 4:59 PM, DuyHai Doan <doanduyhai@gmail.com>= ; wrote:
The symptoms look= s like there are pending compactions stacking up or failed compactions so t= emporary files (-tmp-Data.db) are not properly cleaned up.

=C2=A0What is your Cassandra version ? Can you do a "nodetoo= l tpstats" and look into Cassandra logs to see whether there are issue= s with compactions ?

I've found one discussion thread that have the same symptoms:= http://comments.gmane.org/gmane.comp.db.cassandra.user/220= 89



On Sun, May 4, 2014 at 10:39 AM, Yatong Zh= ang <blueflycn@gmail.com> wrote:
Yes after a while the = disk fills up again. So I changed the compaction strategy from 'sized t= iered' to 'leveled' to reduce the disk usage when compacting, b= ut the problem still occurs.

This table has lots of write and a relative very small read, and = no update. here is the schema of the table:

CREATE TABLE m= ydb.images (
=C2=A0 image_id uuid PRIMARY KEY,
=C2=A0 available boole= an,
=C2=A0 message text,
=C2=A0 raw_data blob,
=C2=A0 time_created timest= amp,
=C2=A0 url text
) WITH
=C2=A0 bloom_filter_fp_chance=3D0.0100= 00 AND
=C2=A0 caching=3D'KEYS_ONLY' AND
=C2=A0 comment=3D'= ;' AND
=C2=A0 dclocal_read_repair_chance=3D0.000000 AND
=C2=A0 gc_grace_seconds=3D864000 AND
=C2=A0 read_repair_chance=3D0.10000= 0 AND
=C2=A0 replicate_on_write=3D'true' AND
=C2=A0 populate_= io_cache_on_flush=3D'false' AND
=C2=A0 compaction=3D{'sstabl= e_size_in_mb': '192', 'class': 'LeveledCompactionSt= rategy'} AND
=C2=A0 compression=3D{'sstable_compression': 'LZ4Compressor'= ;};


On Sun, May 4, 2014 at 4:31 PM, DuyHai Doan <doan= duyhai@gmail.com> wrote:
And after a while= the /data6 drive fills up again right ?

=C2=A0One question, c= an you please give the CQL3 definition of your "mydb-images-tmp" table ?=

What is the access pattern for this table ? Lots of write = ? Lots of update ?




On Sun, May 4, 2014 at 10:00 AM, Yatong = Zhang <blueflycn@gmail.com> wrote:
after restarting or 'cleanup' the big tmp fil= e has gone and all looks like fine:

-rw-r--r-- 1 root root=C2=A0 19K Apr 30 13:58 mydb_oe-images-tmp-jb-96242-C= ompressionInfo.db
-rw-r--r-- 1 root root 145M Apr 30 13:58 mydb_oe-images-tmp-jb-96242-Data.d= b
-rw-r--r-- 1 root root=C2=A0 64K Apr 30 13:58 mydb_oe-images-tmp-jb-96= 242-Index.db

[root@node5 images]# df -hl
Filesystem=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 Size=C2=A0 Used Avail Use% Mounted= on
/dev/sda1=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 49G=C2=A0 7.5G=C2=A0=C2= =A0 39G=C2=A0 17% /
tmpfs=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0 7.8G=C2=A0=C2=A0=C2=A0=C2=A0 0=C2=A0 7.8G=C2=A0=C2=A0 0% /d= ev/shm
/dev/sda3=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 3.6T=C2=A0 1.3T=C2= =A0 2.1T=C2=A0 38% /data1
/dev/sdb1=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 = 3.6T=C2=A0 1.4T=C2=A0 2.1T=C2=A0 39% /data2
/dev/sdc1=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0 3.6T=C2=A0 466G=C2=A0 3.0T=C2=A0 14% /data3
/dev/sdd1=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 3.6T=C2=A0 1.3T=C2=A0 2.2T=C2= =A0 38% /data4
/dev/sde1=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 3.6T=C2=A0 = 1.3T=C2=A0 2.2T=C2=A0 38% /data5
/dev/sdf1=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 3.6T=C2=A0 662M=C2=A0 3.4T= =C2=A0=C2=A0 1% /data6

I didn't perform repair, not= even for one time


On Sun, May 4, 2014 at 2:37 PM, DuyHai Doan <doanduyhai@gmail.com> wrote:
Hello Yatong
<= br>
"If I restart the node or using 'cleanup', it will re= sume to normal." --> what does df -hl shows for /data6 when you res= tart or cleanup the node ?

By the way, a single SSTable of 3.6Tb is kind of huge. Do yo= u perform manual repair frequently ?



=


On Sun, May 4, 2014 at 1:51 AM, Yatong Zhang <blueflycn@gmail.com&g= t; wrote:
My Cassandra cluster has pl= enty of free space, for now only about 30% of space are used


On Sun, = May 4, 2014 at 6:36 AM, Yatong Zhang <blueflycn@gmail.com>= wrote:
Hi there,

It wa= s strange that the 'xxx-tmp-xxx.db' file kept increasing until Cass= andra throw exceptions with 'No space left on device'. I am using C= QL 3 to create a table to store data about 200K ~ 500K per record. I have 6= harddisks per node and cassandra was=20 configured with 6 data directories(ext4 file systems, Centos 6.5):

<= blockquote style=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,= 204,204);padding-left:1ex" class=3D"gmail_quote">data_file_directories:
=C2=A0=C2=A0=C2=A0 - /data1/cass
=C2=A0=C2=A0=C2=A0 - /data2/cass
=C2= =A0=C2=A0=C2=A0 - /data3/cass
=C2=A0=C2=A0=C2=A0 - /data4/cass
=C2=A0=C2=A0=C2=A0 - /data5/cass
=C2=A0=C2=A0=C2=A0 - /data6/cass

And every directory is on a standalone disk. But I just found= when the error occurred:

[root@node5 images]# ll -hl
total 3.6T
drwxr-xr-x 4 root root 4.0K Ja= n 20 09:44 snapshots
-rw-r--r-- 1 root root 456M Apr 30 13:42 mydb-image= s-tmp-jb-91068-CompressionInfo.db
-rw= -r--r-- 1 root root 3.5T Apr 30 13:42 mydb-images-tmp-jb-91068-Data.db
-rw-r--r-- 1 root root=C2=A0=C2=A0=C2=A0 0 Apr 30 13:42 mydb-images-tmp-jb-= 91068-Filter.db
-rw-r--r-- 1 root root 2.0G Apr 30 13:42 mydb-images-tmp= -jb-91068-Index.db

[root@node5 images]# df -hl
= Filesystem=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 Size=C2=A0 Used Avail Use% Mounted= on
/dev/sda1=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 49G=C2=A0 7.5G=C2=A0=C2= =A0 39G=C2=A0 17% /
tmpfs=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0 7.8G=C2=A0=C2=A0=C2=A0=C2=A0 0=C2=A0 7.8G=C2=A0=C2=A0 0% /d= ev/shm
/dev/sda3=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 3.6T=C2=A0 1.3T=C2= =A0 2.1T=C2=A0 38% /data1
/dev/sdb1=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 = 3.6T=C2=A0 1.4T=C2=A0 2.1T=C2=A0 39% /data2
/dev/sdc1=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0 3.6T=C2=A0 466G=C2=A0 3.0T=C2=A0 14% /data3
/dev/sdd1=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 3.6T=C2=A0 1.3T=C2=A0 2.2T=C2= =A0 38% /data4
/dev/sde1=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 3.6T=C2=A0 = 1.3T=C2=A0 2.2T=C2=A0 38% /data5
/dev= /sdf1=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 3.6T=C2=A0 3.6T=C2=A0=C2=A0=C2=A0= =C2=A0 0 100% /data6

mydb-images-tmp-jb-91068-Data.= db almost occupied all the disk space (4T harddisk with 3.6T act= ual usable size) and the error looks like:

INFO [FlushWriter:417= 4] 2014-05-04 05:15:15,744 Memtable.java (line 403) Completed flushing /dat= a3/cass/system/compactions_in_progress/system-compactions_in_progress-jb-16= 942-Data.db (42 bytes) for commitlog position ReplayPosition(segmentId=3D13= 98900356204, position=3D25024609)
=C2=A0INFO [CompactionExecutor:3689] 2014-05-04 05:15:15,745 CompactionTask= .java (line 115) Compacting [SSTableReader(path=3D'/data3/cass/system/c= ompactions_in_progress/system-compactions_in_progress-jb-16940-Data.db'= ), SSTableReader(path=3D'/data3/cass/system/compactions_in_progress/sys= tem-compactions_in_progress-jb-16942-Data.db'), SSTableReader(path=3D&#= 39;/data3/cass/system/compactions_in_progress/system-compactions_in_progres= s-jb-16941-Data.db'), SSTableReader(path=3D'/data3/cass/system/comp= actions_in_progress/system-compactions_in_progress-jb-16939-Data.db')]<= br> ERROR [CompactionExecutor:1245] 2014-05-04 05:15:15,745 CassandraDaemon.jav= a (line 198) Exception in thread Thread[CompactionExecutor:1245,1,main]
= FSWriteError in /data2/cass/mydb/images/mydb-images-tmp-jb-92181-Filter.db<= br> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 at org.apache.cassandra.io.sstab= le.SSTableWriter$IndexWriter.close(SSTableWriter.java:475)
=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 at org.apache.cassandra.io.util.FileUtils.cl= oseQuietly(FileUtils.java:212)
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0 at org.apache.cassandra.io.sstable.SSTableWriter.abort(SSTableWriter.ja= va:301)
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 at org.apache.cassandra.db.compa= ction.CompactionTask.runWith(CompactionTask.java:209)
=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0 at org.apache.cassandra.io.util.DiskAwareRunnable.= runMayThrow(DiskAwareRunnable.java:48)
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0 at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable= .java:28)
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 at org.apache.cassandra.db.compa= ction.CompactionTask.executeInternal(CompactionTask.java:60)
=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 at org.apache.cassandra.db.compaction.Abs= tractCompactionTask.execute(AbstractCompactionTask.java:59)
=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 at org.apache.cassandra.db.compaction.Compac= tionManager$BackgroundCompactionTask.run(CompactionManager.java:197)
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 at java.util.concurrent.Executor= s$RunnableAdapter.call(Executors.java:471)
=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0 at java.util.concurrent.FutureTask.run(FutureTask.java:262)=
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 at java.util.concurrent.Thre= adPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 at java.util.concurrent.ThreadPo= olExecutor$Worker.run(ThreadPoolExecutor.java:615)
=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0 at java.lang.Thread.run(Thread.java:744)
Caused by= : java.io.IOException: No space left on device
=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0 at java.io.FileOutputStream.write(Native Method)
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 at java.io.FileOutputStream.writ= e(FileOutputStream.java:295)
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 = at java.io.DataOutputStream.writeInt(DataOutputStream.java:197)
=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 at org.apache.cassandra.utils.BloomFil= terSerializer.serialize(BloomFilterSerializer.java:34)
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 at org.apache.cassandra.utils.Mu= rmur3BloomFilter$Murmur3BloomFilterSerializer.serialize(Murmur3BloomFilter.= java:44)
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 at org.apache.cassan= dra.utils.FilterFactory.serialize(FilterFactory.java:41)
=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0 at org.apache.cassandra.io.sstable.SSTableWrite= r$IndexWriter.close(SSTableWriter.java:468)
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 ... 13 more
ERROR [Compaction= Executor:1245] 2014-05-04 05:15:15,800 StorageService.java (line 367) Stopp= ing gossiper
=C2=A0WARN [CompactionExecutor:1245] 2014-05-04 05:15:15,80= 0 StorageService.java (line 281) Stopping gossip by operator request
=C2=A0INFO [CompactionExecutor:1245] 2014-05-04 05:15:15,800 Gossiper.java = (line 1271) Announcing shutdown


I have change= d my table to "LeveledCompactionStrategy" to reduce the disk size= needed when compaction, with:

ALTER TABLE images WI= TH compaction =3D { 'class' : 'LeveledCompactionStrategy', = 'sstable_size_in_mb' : '192' };

But the problem still exists: the file kee= p increasing, and after about 2 or 3 days cassandra will fail due to 'N= o space left on device' error.=C2=A0 If I restart the node or using = 9;cleanup', it will resume to normal.

I don't know is it because my configuration or it's = just a bug, so would any one please help to solve this issue?

Thanks=







--089e0149cdc87e181904f88f64fb--