incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From DuyHai Doan <doanduy...@gmail.com>
Subject Re: Cassandra 2.0.7 keeps reporting errors due to no space left on device
Date Sun, 04 May 2014 08:59:07 GMT
The symptoms looks like there are pending compactions stacking up or failed
compactions so temporary files (-tmp-Data.db) are not properly cleaned up.

 What is your Cassandra version ? Can you do a "nodetool tpstats" and look
into Cassandra logs to see whether there are issues with compactions ?

I've found one discussion thread that have the same symptoms:
http://comments.gmane.org/gmane.comp.db.cassandra.user/22089




On Sun, May 4, 2014 at 10:39 AM, Yatong Zhang <blueflycn@gmail.com> wrote:

> Yes after a while the disk fills up again. So I changed the compaction
> strategy from 'sized tiered' to 'leveled' to reduce the disk usage when
> compacting, but the problem still occurs.
>
> This table has lots of write and a relative very small read, and no
> update. here is the schema of the table:
>
> CREATE TABLE mydb.images (
>   image_id uuid PRIMARY KEY,
>   available boolean,
>   message text,
>   raw_data blob,
>   time_created timestamp,
>   url text
> ) WITH
>   bloom_filter_fp_chance=0.010000 AND
>   caching='KEYS_ONLY' AND
>   comment='' AND
>   dclocal_read_repair_chance=0.000000 AND
>   gc_grace_seconds=864000 AND
>   read_repair_chance=0.100000 AND
>   replicate_on_write='true' AND
>   populate_io_cache_on_flush='false' AND
>   compaction={'sstable_size_in_mb': '192', 'class':
> 'LeveledCompactionStrategy'} AND
>   compression={'sstable_compression': 'LZ4Compressor'};
>
>
> On Sun, May 4, 2014 at 4:31 PM, DuyHai Doan <doanduyhai@gmail.com> wrote:
>
>> And after a while the /data6 drive fills up again right ?
>>
>>  One question, can you please give the CQL3 definition of your "mydb-images-tmp"
>> table ?
>>
>> What is the access pattern for this table ? Lots of write ? Lots of
>> update ?
>>
>>
>>
>>
>> On Sun, May 4, 2014 at 10:00 AM, Yatong Zhang <blueflycn@gmail.com>wrote:
>>
>>> after restarting or 'cleanup' the big tmp file has gone and all looks
>>> like fine:
>>>
>>> -rw-r--r-- 1 root root  19K Apr 30 13:58
>>>> mydb_oe-images-tmp-jb-96242-CompressionInfo.db
>>>> -rw-r--r-- 1 root root 145M Apr 30 13:58
>>>> mydb_oe-images-tmp-jb-96242-Data.db
>>>> -rw-r--r-- 1 root root  64K Apr 30 13:58
>>>> mydb_oe-images-tmp-jb-96242-Index.db
>>>>
>>>
>>> [root@node5 images]# df -hl
>>> Filesystem      Size  Used Avail Use% Mounted on
>>> /dev/sda1        49G  7.5G   39G  17% /
>>> tmpfs           7.8G     0  7.8G   0% /dev/shm
>>> /dev/sda3       3.6T  1.3T  2.1T  38% /data1
>>> /dev/sdb1       3.6T  1.4T  2.1T  39% /data2
>>> /dev/sdc1       3.6T  466G  3.0T  14% /data3
>>> /dev/sdd1       3.6T  1.3T  2.2T  38% /data4
>>> /dev/sde1       3.6T  1.3T  2.2T  38% /data5
>>> /dev/sdf1       3.6T  662M  3.4T   1% /data6
>>>
>>> I didn't perform repair, not even for one time
>>>
>>>
>>> On Sun, May 4, 2014 at 2:37 PM, DuyHai Doan <doanduyhai@gmail.com>wrote:
>>>
>>>> Hello Yatong
>>>>
>>>> "If I restart the node or using 'cleanup', it will resume to normal."
>>>> --> what does df -hl shows for /data6 when you restart or cleanup the
node ?
>>>>
>>>> By the way, a single SSTable of 3.6Tb is kind of huge. Do you perform
>>>> manual repair frequently ?
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Sun, May 4, 2014 at 1:51 AM, Yatong Zhang <blueflycn@gmail.com>wrote:
>>>>
>>>>> My Cassandra cluster has plenty of free space, for now only about 30%
>>>>> of space are used
>>>>>
>>>>>
>>>>> On Sun, May 4, 2014 at 6:36 AM, Yatong Zhang <blueflycn@gmail.com>wrote:
>>>>>
>>>>>> Hi there,
>>>>>>
>>>>>> It was strange that the 'xxx-tmp-xxx.db' file kept increasing until
>>>>>> Cassandra throw exceptions with 'No space left on device'. I am using
CQL 3
>>>>>> to create a table to store data about 200K ~ 500K per record. I have
6
>>>>>> harddisks per node and cassandra was configured with 6 data
>>>>>> directories(ext4 file systems, Centos 6.5):
>>>>>>
>>>>>> data_file_directories:
>>>>>>>     - /data1/cass
>>>>>>>     - /data2/cass
>>>>>>>     - /data3/cass
>>>>>>>     - /data4/cass
>>>>>>>     - /data5/cass
>>>>>>>     - /data6/cass
>>>>>>>
>>>>>>
>>>>>> And every directory is on a standalone disk. But I just found when
>>>>>> the error occurred:
>>>>>>
>>>>>> [root@node5 images]# ll -hl
>>>>>>> total 3.6T
>>>>>>> drwxr-xr-x 4 root root 4.0K Jan 20 09:44 snapshots
>>>>>>> -rw-r--r-- 1 root root 456M Apr 30 13:42
>>>>>>> mydb-images-tmp-jb-91068-CompressionInfo.db
>>>>>>> -rw-r--r-- 1 root root 3.5T Apr 30 13:42
>>>>>>> mydb-images-tmp-jb-91068-Data.db
>>>>>>> -rw-r--r-- 1 root root    0 Apr 30 13:42
>>>>>>> mydb-images-tmp-jb-91068-Filter.db
>>>>>>> -rw-r--r-- 1 root root 2.0G Apr 30 13:42
>>>>>>> mydb-images-tmp-jb-91068-Index.db
>>>>>>>
>>>>>>
>>>>>> [root@node5 images]# df -hl
>>>>>> Filesystem      Size  Used Avail Use% Mounted on
>>>>>> /dev/sda1        49G  7.5G   39G  17% /
>>>>>> tmpfs           7.8G     0  7.8G   0% /dev/shm
>>>>>> /dev/sda3       3.6T  1.3T  2.1T  38% /data1
>>>>>> /dev/sdb1       3.6T  1.4T  2.1T  39% /data2
>>>>>> /dev/sdc1       3.6T  466G  3.0T  14% /data3
>>>>>> /dev/sdd1       3.6T  1.3T  2.2T  38% /data4
>>>>>> /dev/sde1       3.6T  1.3T  2.2T  38% /data5
>>>>>> /dev/sdf1       3.6T  3.6T     0 100% /data6
>>>>>>
>>>>>> *mydb-images-tmp-jb-91068-Data.db *almost occupied all the disk
>>>>>> space (4T harddisk with 3.6T actual usable size) and the error looks
like:
>>>>>>
>>>>>> INFO [FlushWriter:4174] 2014-05-04 05:15:15,744 Memtable.java (line
>>>>>>> 403) Completed flushing
>>>>>>> /data3/cass/system/compactions_in_progress/system-compactions_in_progress-jb-16942-Data.db
>>>>>>> (42 bytes) for commitlog position ReplayPosition(segmentId=1398900356204,
>>>>>>> position=25024609)
>>>>>>>  INFO [CompactionExecutor:3689] 2014-05-04 05:15:15,745
>>>>>>> CompactionTask.java (line 115) Compacting
>>>>>>> [SSTableReader(path='/data3/cass/system/compactions_in_progress/system-compactions_in_progress-jb-16940-Data.db'),
>>>>>>> SSTableReader(path='/data3/cass/system/compactions_in_progress/system-compactions_in_progress-jb-16942-Data.db'),
>>>>>>> SSTableReader(path='/data3/cass/system/compactions_in_progress/system-compactions_in_progress-jb-16941-Data.db'),
>>>>>>> SSTableReader(path='/data3/cass/system/compactions_in_progress/system-compactions_in_progress-jb-16939-Data.db')]
>>>>>>> ERROR [CompactionExecutor:1245] 2014-05-04 05:15:15,745
>>>>>>> CassandraDaemon.java (line 198) Exception in thread
>>>>>>> Thread[CompactionExecutor:1245,1,main]
>>>>>>> FSWriteError in
>>>>>>> /data2/cass/mydb/images/mydb-images-tmp-jb-92181-Filter.db
>>>>>>>         at
>>>>>>> org.apache.cassandra.io.sstable.SSTableWriter$IndexWriter.close(SSTableWriter.java:475)
>>>>>>>         at
>>>>>>> org.apache.cassandra.io.util.FileUtils.closeQuietly(FileUtils.java:212)
>>>>>>>         at
>>>>>>> org.apache.cassandra.io.sstable.SSTableWriter.abort(SSTableWriter.java:301)
>>>>>>>         at
>>>>>>> org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:209)
>>>>>>>         at
>>>>>>> org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48)
>>>>>>>         at
>>>>>>> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
>>>>>>>         at
>>>>>>> org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:60)
>>>>>>>         at
>>>>>>> org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:59)
>>>>>>>         at
>>>>>>> org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionTask.run(CompactionManager.java:197)
>>>>>>>         at
>>>>>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>>>>>>>         at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>>>>>>>         at
>>>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>>>>>         at
>>>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>>>>>         at java.lang.Thread.run(Thread.java:744)
>>>>>>> Caused by: java.io.IOException: No space left on device
>>>>>>>         at java.io.FileOutputStream.write(Native Method)
>>>>>>>         at java.io.FileOutputStream.write(FileOutputStream.java:295)
>>>>>>>         at
>>>>>>> java.io.DataOutputStream.writeInt(DataOutputStream.java:197)
>>>>>>>         at
>>>>>>> org.apache.cassandra.utils.BloomFilterSerializer.serialize(BloomFilterSerializer.java:34)
>>>>>>>         at
>>>>>>> org.apache.cassandra.utils.Murmur3BloomFilter$Murmur3BloomFilterSerializer.serialize(Murmur3BloomFilter.java:44)
>>>>>>>         at
>>>>>>> org.apache.cassandra.utils.FilterFactory.serialize(FilterFactory.java:41)
>>>>>>>         at
>>>>>>> org.apache.cassandra.io.sstable.SSTableWriter$IndexWriter.close(SSTableWriter.java:468)
>>>>>>>         ... 13 more
>>>>>>> ERROR [CompactionExecutor:1245] 2014-05-04 05:15:15,800
>>>>>>> StorageService.java (line 367) Stopping gossiper
>>>>>>>  WARN [CompactionExecutor:1245] 2014-05-04 05:15:15,800
>>>>>>> StorageService.java (line 281) Stopping gossip by operator request
>>>>>>>  INFO [CompactionExecutor:1245] 2014-05-04 05:15:15,800
>>>>>>> Gossiper.java (line 1271) Announcing shutdown
>>>>>>>
>>>>>>
>>>>>>
>>>>>> I have changed my table to "LeveledCompactionStrategy" to reduce
the
>>>>>> disk size needed when compaction, with:
>>>>>>
>>>>>> ALTER TABLE images WITH compaction = { 'class' :
>>>>>>> 'LeveledCompactionStrategy', 'sstable_size_in_mb' : '192' };
>>>>>>>
>>>>>>
>>>>>> But the problem still exists: the file keep increasing, and after
>>>>>> about 2 or 3 days cassandra will fail due to 'No space left on device'
>>>>>> error.  If I restart the node or using 'cleanup', it will resume
to normal.
>>>>>>
>>>>>> I don't know is it because my configuration or it's just a bug, so
>>>>>> would any one please help to solve this issue?
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message