Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 7072510507 for ; Sat, 3 May 2014 23:51:44 +0000 (UTC) Received: (qmail 34425 invoked by uid 500); 3 May 2014 23:51:41 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 34390 invoked by uid 500); 3 May 2014 23:51:41 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 34382 invoked by uid 99); 3 May 2014 23:51:40 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 03 May 2014 23:51:40 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of blueflycn@gmail.com designates 209.85.214.171 as permitted sender) Received: from [209.85.214.171] (HELO mail-ob0-f171.google.com) (209.85.214.171) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 03 May 2014 23:51:35 +0000 Received: by mail-ob0-f171.google.com with SMTP id wn1so2111731obc.30 for ; Sat, 03 May 2014 16:51:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=OfPx3WYSRQ0mMhmQVgYIysnZwyfI/z/SLjrZNLYw920=; b=APtsZ6hOy164KY8k3Yf9yFouAg6XB82ptncaILt+y/yBKgFR3OSzoDSgTMtZMVKG/h ZaU7z5wKwe4zC4R8Rear7RDTnrBkZIV964JVw2ZDr62zPXl8hEIz0Z/7vIh0afwO0zyu tNsq8GyxEoJP6Ffwgc2rkCCEumZMU/PPW9x/8L1UNUnbgNbM9TqzEmcFCZqcfxMWGkH2 GrvklzUhn0ijsXmH5BsXMIzttxdV304xOQ5BR0/i/oam3UGBl1R3p7Q+pG91Q4mjgGto uNDkABiDrk7Taz6E/VkJJLRxDfxKFpsI99yIMDDH6rYQ3guPcv+K7k0qTTxwyLOx0ui3 CTiA== MIME-Version: 1.0 X-Received: by 10.60.141.9 with SMTP id rk9mr24965519oeb.12.1399161071901; Sat, 03 May 2014 16:51:11 -0700 (PDT) Received: by 10.182.24.201 with HTTP; Sat, 3 May 2014 16:51:11 -0700 (PDT) In-Reply-To: References: Date: Sun, 4 May 2014 07:51:11 +0800 Message-ID: Subject: Re: Cassandra 2.0.7 keeps reporting errors due to no space left on device From: Yatong Zhang To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=047d7b3a9cacac9fe604f88792b6 X-Virus-Checked: Checked by ClamAV on apache.org --047d7b3a9cacac9fe604f88792b6 Content-Type: text/plain; charset=UTF-8 My Cassandra cluster has plenty of free space, for now only about 30% of space are used On Sun, May 4, 2014 at 6:36 AM, Yatong Zhang wrote: > Hi there, > > It was strange that the 'xxx-tmp-xxx.db' file kept increasing until > Cassandra throw exceptions with 'No space left on device'. I am using CQL 3 > to create a table to store data about 200K ~ 500K per record. I have 6 > harddisks per node and cassandra was configured with 6 data > directories(ext4 file systems, Centos 6.5): > > data_file_directories: >> - /data1/cass >> - /data2/cass >> - /data3/cass >> - /data4/cass >> - /data5/cass >> - /data6/cass >> > > And every directory is on a standalone disk. But I just found when the > error occurred: > > [root@node5 images]# ll -hl >> total 3.6T >> drwxr-xr-x 4 root root 4.0K Jan 20 09:44 snapshots >> -rw-r--r-- 1 root root 456M Apr 30 13:42 >> mydb-images-tmp-jb-91068-CompressionInfo.db >> -rw-r--r-- 1 root root 3.5T Apr 30 13:42 mydb-images-tmp-jb-91068-Data.db >> -rw-r--r-- 1 root root 0 Apr 30 13:42 >> mydb-images-tmp-jb-91068-Filter.db >> -rw-r--r-- 1 root root 2.0G Apr 30 13:42 mydb-images-tmp-jb-91068-Index.db >> > > [root@node5 images]# df -hl > Filesystem Size Used Avail Use% Mounted on > /dev/sda1 49G 7.5G 39G 17% / > tmpfs 7.8G 0 7.8G 0% /dev/shm > /dev/sda3 3.6T 1.3T 2.1T 38% /data1 > /dev/sdb1 3.6T 1.4T 2.1T 39% /data2 > /dev/sdc1 3.6T 466G 3.0T 14% /data3 > /dev/sdd1 3.6T 1.3T 2.2T 38% /data4 > /dev/sde1 3.6T 1.3T 2.2T 38% /data5 > /dev/sdf1 3.6T 3.6T 0 100% /data6 > > *mydb-images-tmp-jb-91068-Data.db *almost occupied all the disk space (4T > harddisk with 3.6T actual usable size) and the error looks like: > > INFO [FlushWriter:4174] 2014-05-04 05:15:15,744 Memtable.java (line 403) >> Completed flushing >> /data3/cass/system/compactions_in_progress/system-compactions_in_progress-jb-16942-Data.db >> (42 bytes) for commitlog position ReplayPosition(segmentId=1398900356204, >> position=25024609) >> INFO [CompactionExecutor:3689] 2014-05-04 05:15:15,745 >> CompactionTask.java (line 115) Compacting >> [SSTableReader(path='/data3/cass/system/compactions_in_progress/system-compactions_in_progress-jb-16940-Data.db'), >> SSTableReader(path='/data3/cass/system/compactions_in_progress/system-compactions_in_progress-jb-16942-Data.db'), >> SSTableReader(path='/data3/cass/system/compactions_in_progress/system-compactions_in_progress-jb-16941-Data.db'), >> SSTableReader(path='/data3/cass/system/compactions_in_progress/system-compactions_in_progress-jb-16939-Data.db')] >> ERROR [CompactionExecutor:1245] 2014-05-04 05:15:15,745 >> CassandraDaemon.java (line 198) Exception in thread >> Thread[CompactionExecutor:1245,1,main] >> FSWriteError in /data2/cass/mydb/images/mydb-images-tmp-jb-92181-Filter.db >> at >> org.apache.cassandra.io.sstable.SSTableWriter$IndexWriter.close(SSTableWriter.java:475) >> at >> org.apache.cassandra.io.util.FileUtils.closeQuietly(FileUtils.java:212) >> at >> org.apache.cassandra.io.sstable.SSTableWriter.abort(SSTableWriter.java:301) >> at >> org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:209) >> at >> org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48) >> at >> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) >> at >> org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:60) >> at >> org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:59) >> at >> org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionTask.run(CompactionManager.java:197) >> at >> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) >> at java.util.concurrent.FutureTask.run(FutureTask.java:262) >> at >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >> at java.lang.Thread.run(Thread.java:744) >> Caused by: java.io.IOException: No space left on device >> at java.io.FileOutputStream.write(Native Method) >> at java.io.FileOutputStream.write(FileOutputStream.java:295) >> at java.io.DataOutputStream.writeInt(DataOutputStream.java:197) >> at >> org.apache.cassandra.utils.BloomFilterSerializer.serialize(BloomFilterSerializer.java:34) >> at >> org.apache.cassandra.utils.Murmur3BloomFilter$Murmur3BloomFilterSerializer.serialize(Murmur3BloomFilter.java:44) >> at >> org.apache.cassandra.utils.FilterFactory.serialize(FilterFactory.java:41) >> at >> org.apache.cassandra.io.sstable.SSTableWriter$IndexWriter.close(SSTableWriter.java:468) >> ... 13 more >> ERROR [CompactionExecutor:1245] 2014-05-04 05:15:15,800 >> StorageService.java (line 367) Stopping gossiper >> WARN [CompactionExecutor:1245] 2014-05-04 05:15:15,800 >> StorageService.java (line 281) Stopping gossip by operator request >> INFO [CompactionExecutor:1245] 2014-05-04 05:15:15,800 Gossiper.java >> (line 1271) Announcing shutdown >> > > > I have changed my table to "LeveledCompactionStrategy" to reduce the disk > size needed when compaction, with: > > ALTER TABLE images WITH compaction = { 'class' : >> 'LeveledCompactionStrategy', 'sstable_size_in_mb' : '192' }; >> > > But the problem still exists: the file keep increasing, and after about 2 > or 3 days cassandra will fail due to 'No space left on device' error. If I > restart the node or using 'cleanup', it will resume to normal. > > I don't know is it because my configuration or it's just a bug, so would > any one please help to solve this issue? > > Thanks > --047d7b3a9cacac9fe604f88792b6 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
My Cassandra cluster has plenty of free space, for now onl= y about 30% of space are used


<= div class=3D"gmail_quote">On Sun, May 4, 2014 at 6:36 AM, Yatong Zhang <= blueflycn@gmail.com> wrote:
Hi there,

It wa= s strange that the 'xxx-tmp-xxx.db' file kept increasing until Cass= andra throw exceptions with 'No space left on device'. I am using C= QL 3 to create a table to store data about 200K ~ 500K per record. I have 6= harddisks per node and cassandra was=20 configured with 6 data directories(ext4 file systems, Centos 6.5):

<= blockquote style=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,= 204,204);padding-left:1ex" class=3D"gmail_quote">data_file_directories:
=C2=A0=C2=A0=C2=A0 - /data1/cass
=C2=A0=C2=A0=C2=A0 - /data2/cass
=C2= =A0=C2=A0=C2=A0 - /data3/cass
=C2=A0=C2=A0=C2=A0 - /data4/cass
=C2=A0=C2=A0=C2=A0 - /data5/cass
=C2=A0=C2=A0=C2=A0 - /data6/cass

And every directory is on a standalone disk. But I just found= when the error occurred:

[root@node5 images]# ll -hl
total 3.6T
drwxr-xr-x 4 root root 4.0K Ja= n 20 09:44 snapshots
-rw-r--r-- 1 root root 456M Apr 30 13:42 mydb-image= s-tmp-jb-91068-CompressionInfo.db
-rw= -r--r-- 1 root root 3.5T Apr 30 13:42 mydb-images-tmp-jb-91068-Data.db
-rw-r--r-- 1 root root=C2=A0=C2=A0=C2=A0 0 Apr 30 13:42 mydb-images-tmp-jb-= 91068-Filter.db
-rw-r--r-- 1 root root 2.0G Apr 30 13:42 mydb-images-tmp= -jb-91068-Index.db

[root@node5 images]# df -hl
= Filesystem=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 Size=C2=A0 Used Avail Use% Mounted= on
/dev/sda1=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 49G=C2=A0 7.5G=C2=A0=C2= =A0 39G=C2=A0 17% /
tmpfs=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0 7.8G=C2=A0=C2=A0=C2=A0=C2=A0 0=C2=A0 7.8G=C2=A0=C2=A0 0% /d= ev/shm
/dev/sda3=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 3.6T=C2=A0 1.3T=C2= =A0 2.1T=C2=A0 38% /data1
/dev/sdb1=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 = 3.6T=C2=A0 1.4T=C2=A0 2.1T=C2=A0 39% /data2
/dev/sdc1=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0 3.6T=C2=A0 466G=C2=A0 3.0T=C2=A0 14% /data3
/dev/sdd1=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 3.6T=C2=A0 1.3T=C2=A0 2.2T=C2= =A0 38% /data4
/dev/sde1=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 3.6T=C2=A0 = 1.3T=C2=A0 2.2T=C2=A0 38% /data5
/dev= /sdf1=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 3.6T=C2=A0 3.6T=C2=A0=C2=A0=C2=A0= =C2=A0 0 100% /data6

mydb-images-tmp-jb-91068-Data.= db almost occupied all the disk space (4T harddisk with 3.6T act= ual usable size) and the error looks like:

INFO [FlushWriter:417= 4] 2014-05-04 05:15:15,744 Memtable.java (line 403) Completed flushing /dat= a3/cass/system/compactions_in_progress/system-compactions_in_progress-jb-16= 942-Data.db (42 bytes) for commitlog position ReplayPosition(segmentId=3D13= 98900356204, position=3D25024609)
=C2=A0INFO [CompactionExecutor:3689] 2014-05-04 05:15:15,745 CompactionTask= .java (line 115) Compacting [SSTableReader(path=3D'/data3/cass/system/c= ompactions_in_progress/system-compactions_in_progress-jb-16940-Data.db'= ), SSTableReader(path=3D'/data3/cass/system/compactions_in_progress/sys= tem-compactions_in_progress-jb-16942-Data.db'), SSTableReader(path=3D&#= 39;/data3/cass/system/compactions_in_progress/system-compactions_in_progres= s-jb-16941-Data.db'), SSTableReader(path=3D'/data3/cass/system/comp= actions_in_progress/system-compactions_in_progress-jb-16939-Data.db')]<= br> ERROR [CompactionExecutor:1245] 2014-05-04 05:15:15,745 CassandraDaemon.jav= a (line 198) Exception in thread Thread[CompactionExecutor:1245,1,main]
= FSWriteError in /data2/cass/mydb/images/mydb-images-tmp-jb-92181-Filter.db<= br> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 at org.apache.cassandra.io.sstab= le.SSTableWriter$IndexWriter.close(SSTableWriter.java:475)
=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 at org.apache.cassandra.io.util.FileUtils.cl= oseQuietly(FileUtils.java:212)
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0 at org.apache.cassandra.io.sstable.SSTableWriter.abort(SSTableWriter.ja= va:301)
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 at org.apache.cassandra.db.compa= ction.CompactionTask.runWith(CompactionTask.java:209)
=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0 at org.apache.cassandra.io.util.DiskAwareRunnable.= runMayThrow(DiskAwareRunnable.java:48)
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0 at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable= .java:28)
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 at org.apache.cassandra.db.compa= ction.CompactionTask.executeInternal(CompactionTask.java:60)
=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 at org.apache.cassandra.db.compaction.Abs= tractCompactionTask.execute(AbstractCompactionTask.java:59)
=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 at org.apache.cassandra.db.compaction.Compac= tionManager$BackgroundCompactionTask.run(CompactionManager.java:197)
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 at java.util.concurrent.Executor= s$RunnableAdapter.call(Executors.java:471)
=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0 at java.util.concurrent.FutureTask.run(FutureTask.java:262)=
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 at java.util.concurrent.Thre= adPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 at java.util.concurrent.ThreadPo= olExecutor$Worker.run(ThreadPoolExecutor.java:615)
=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0 at java.lang.Thread.run(Thread.java:744)
Caused by= : java.io.IOException: No space left on device
=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0 at java.io.FileOutputStream.write(Native Method)
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 at java.io.FileOutputStream.writ= e(FileOutputStream.java:295)
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 = at java.io.DataOutputStream.writeInt(DataOutputStream.java:197)
=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 at org.apache.cassandra.utils.BloomFil= terSerializer.serialize(BloomFilterSerializer.java:34)
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 at org.apache.cassandra.utils.Mu= rmur3BloomFilter$Murmur3BloomFilterSerializer.serialize(Murmur3BloomFilter.= java:44)
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 at org.apache.cassan= dra.utils.FilterFactory.serialize(FilterFactory.java:41)
=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0 at org.apache.cassandra.io.sstable.SSTableWrite= r$IndexWriter.close(SSTableWriter.java:468)
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 ... 13 more
ERROR [Compaction= Executor:1245] 2014-05-04 05:15:15,800 StorageService.java (line 367) Stopp= ing gossiper
=C2=A0WARN [CompactionExecutor:1245] 2014-05-04 05:15:15,80= 0 StorageService.java (line 281) Stopping gossip by operator request
=C2=A0INFO [CompactionExecutor:1245] 2014-05-04 05:15:15,800 Gossiper.java = (line 1271) Announcing shutdown


I have change= d my table to "LeveledCompactionStrategy" to reduce the disk size= needed when compaction, with:

ALTER TABLE images WI= TH compaction =3D { 'class' : 'LeveledCompactionStrategy', = 'sstable_size_in_mb' : '192' };

But the problem still exists: the file kee= p increasing, and after about 2 or 3 days cassandra will fail due to 'N= o space left on device' error.=C2=A0 If I restart the node or using = 9;cleanup', it will resume to normal.

I don't know is it because my configuration or it's = just a bug, so would any one please help to solve this issue?

Thanks=

--047d7b3a9cacac9fe604f88792b6--