cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alain RODRIGUEZ (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (CASSANDRA-9862) Missing sstable files seen some time after the same file is logged as being part of a compaction
Date Fri, 28 Aug 2015 14:02:46 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-9862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14718600#comment-14718600
] 

Alain RODRIGUEZ edited comment on CASSANDRA-9862 at 8/28/15 2:01 PM:
---------------------------------------------------------------------

I was thinking about it, but, wouldn't it be to make sure that whatever happens before, compactions
don't get stuck ?

I mean, ok, there was a bug, a file vanished or, at least, Cassandra think it did. That's
a bug. But has this bug to lead to this "shitstorm" where compactions get stuck, number of
files keep increasing and finally the node is stopped due to a "too many files exception"
? Can't we detect this kind of issue and restore node state by dropping not existing files
from compaction ?

If we lost data, repairs + CL should handle that, so it is always better to have this node
running normally. So imho, this would only make Cassandra more robust.

Just sharing my thoughts.


was (Author: arodrime):
I was thinking about it, but, wouldn't it be to make sure that whatever happens before, compactions
don't get stuck ?

I mean, ok, there was a bug, a file vanished or, at least, Cassandra think it did. That's
a bug. But does this bug have to lead to this "shitstorm" where compactions get stuck, number
of files keep increasing and finally the node is stopped due to a "too many files exception"
? Can't we detect this kind of issue and restore node state by dropping not existing files
from compaction ?

If we lost data, repairs + CL should handle that, so it is always better to have this node
running normally. So imho, this would only make Cassandra more robust.

Just sharing my thoughts.

> Missing sstable files seen some time after the same file is logged as being part of a
compaction
> ------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-9862
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9862
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Mark Curtis
>            Assignee: Yuki Morishita
>
> The system will sometimes see a file not found exception in a given node’s logs. This
is seen after the same file appears in a compaction statement either earlier on in the logs.
I found the following events of potential interest: 
> I pulled out the missing files using this command:
> {code}
> grep "Caused by: java.io.FileNotFoundException:" * | awk '{print $4}' | sort -u >
files-to-search
> {code}
> Using the following file as an example: my_keyspace-my_table-jb-2085190-Data.db
> File is created from a flush
> system.log.10
> {code}
>    7549  INFO [FlushWriter:1] 2015-06-03 12:25:31,857 Memtable.java (line 395) Completed
flushing /raid0/cassandra/data/my_keyspace/my_table/my_keyspace-my_table-jb-2085190-Data.d
  7549 b (66804 bytes) for commitlog position ReplayPosition(segmentId=1433333700832, position=2479196)
> {code}
> Next it appears in a compaction
> system.log.10
> {code}
>    8758  INFO [CompactionExecutor:5] 2015-06-03 12:25:37,209 CompactionTask.java (line
115) Compacting [SSTableReader(path='/raid0/cassandra/data/my_keyspace/my_table/my_keyspace-my_table-jb-2084993-Data.db'),
SSTableReader(path='/raid0/cassandra/data/my_keyspace/my_table/my_keyspace-my_table-jb-2085164-Data.db'),

> SSTableReader(path='/raid0/cassandra/data/my_keyspace/my_table/my_keyspace-my_table-jb-2085190-Data.db'),
SSTableReader(path='/raid0/cassandra/data/my_keyspace/my_table/my_keyspace-...
> ...<truncated>...
> {code}
> The next time we see the file its in file not found exceptions
> {code}
> system.log.9
>   53811 java.lang.RuntimeException: java.io.FileNotFoundException: /raid0/cassandra/data/my_keyspace/my_table/my_keyspace-my_table-jb-2085190-Data.db
(No such file or directory)
> {code}
> So going back to compaction executor 5 in system.log.10
> {code}
>    9102  INFO [CompactionExecutor:5] 2015-06-03 12:25:38,684 ColumnFamilyStore.java (line
794) Enqueuing flush of Memtable-compactions_in_progress@1431151640(0/0 serialized/live bytes,
1 ops)
> ...
>    9107  INFO [CompactionExecutor:5] 2015-06-03 12:25:38,701 CompactionTask.java (line
287) Compacted 32 sstables to [/raid0/cassandra/data/my_keyspace/my_table/my_keyspace-data_my_table-jb-2085196,].
 1,970,986 bytes to 1,880,866 (~95% of original) in 1,492ms = 1.202234MB/s.  13,576 total
partitions merged to 11,243.  Partition merge counts were {1:9379, 2:1474, 3:324, 4:54   9107
, 5:11, 6:1, }
>    9108  INFO [CompactionExecutor:5] 2015-06-03 12:25:38,702 ColumnFamilyStore.java (line
794) Enqueuing flush of Memtable-compactions_in_progress@493376470(357/3570 serialized/live
bytes, 16 ops)
> ...
>     9111  INFO [CompactionExecutor:5] 2015-06-03 12:25:38,717 CompactionTask.java (line
115) Compacting [SSTableReader(path='/raid0/cassandra/data/my_keyspace/my_table/my_keyspace-my_table-jb-2085182-Data.db'),
SSTableReader(path='/raid0/cassandra/data/my_keyspace/my_table/my_keyspace-my_table-jb-2085192-Data.db'),
SSTableReader(path='/raid0/cassandra/data/my_keyspace/my_table/my_keyspace-my_table-jb-2085175-Data.db'),
SSTableReader(path='/raid0/cassandra/data/my_keyspace/my_table/my_keyspace-my_table-jb-2085196-Data.db'),
SSTableReader(path='/raid0/cassandra/data/my_keyspace/my_table/my_keyspace-my_table-jb-2085194-Data.db'),

> SSTableReader(path='/raid0/cassandra/data/my_keyspace/my_table/my_keyspace-my_table-jb-2085191-Data.db'),
SSTableReader(path='/raid0/cassandra/data/my_keyspace/my_table/my_keyspace-my_table-jb-2085193-Data.db'),
SSTableReader(path='/raid0/cassandra/data/my_keyspace/my_table/my_keyspace-my_table-jb-2085195-Data.db'),

> SSTableReader(path='/raid0/cassandra/data/my_keyspace/my_table/my_keyspace-my_table-jb-2084985-Data.db'),
SSTableReader(path='/raid0/cassandra/data/my_keyspace/my_table/my_keyspace-my_table-jb-2085114-Data.db'),
SSTableReader(path='/raid0/cassandra/data/my_keyspace/my_table/my_keyspace-my_table-jb-2084901-Data.db')]
>    9112 ERROR [CompactionExecutor:5] 2015-06-03 12:25:38,718 CassandraDaemon.java (line
199) Exception in thread Thread[CompactionExecutor:5,1,main]
>    9113 java.lang.RuntimeException: java.io.FileNotFoundException: /raid0/cassandra/data/my_keyspace/my_table/my_keyspace-my_table-jb-2084901-Data.db
(No such file or directory)
> {code}
> So could it be that because the thread "CompactionExecutor:5" hit an exception in line
9113 that the compaction in 8758 didn't complete?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message