cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stefania (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (CASSANDRA-11889) LogRecord: file system race condition may cause verify() to fail
Date Wed, 25 May 2016 02:09:12 GMT

     [ https://issues.apache.org/jira/browse/CASSANDRA-11889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Stefania updated CASSANDRA-11889:
---------------------------------
    Description: 
The following exception was reported in CASSANDRA-11470. It occurred whilst listing files
with compaction in progress:

{code}
WARN  [CompactionExecutor:2006] 2016-05-23 18:23:31,694 BigTableWriter.java:171 - Writing
large partition test_keyspace/test_columnfamily:eda6b9c36f8df6fe596492c3438d7a38e9b109a6 (123663388
bytes)
INFO  [IndexSummaryManager:1] 2016-05-23 18:24:23,731 IndexSummaryRedistribution.java:74 -
Redistributing index summaries
WARN  [CompactionExecutor:2006] 2016-05-23 18:24:56,669 BigTableWriter.java:171 - Writing
large partition test_keyspace/test_columnfamily:05b6b424194dd19ab7cfbcd53c4979768cd859e9 (256286063
bytes)
WARN  [CompactionExecutor:2006] 2016-05-23 18:26:23,575 BigTableWriter.java:171 - Writing
large partition test_keyspace/test_columnfamily:04e9fac15552b9ae77c27a6cb8d0fd11fdcc24d7 (212445557
bytes)
INFO  [CompactionExecutor:2005] 2016-05-23 18:29:26,839 LeveledManifest.java:437 - Adding
high-level (L3) BigTableReader(path='/data/cassandra/data/test_keyspace/test_columnfamily_2-d29dd71045a811e59aff6776bf484396/ma-61041-big-Data.db')
to candidates
WARN  [CompactionExecutor:2006] 2016-05-23 18:30:34,154 BigTableWriter.java:171 - Writing
large partition test_keyspace/test_columnfamily:edbe6f178503be90911dbf29a55b97a4b095a9ec (183852539
bytes)
INFO  [CompactionExecutor:2006] 2016-05-23 18:31:21,080 LeveledManifest.java:437 - Adding
high-level (L3) BigTableReader(path='/data/cassandra/data/test_keyspace/test_columnfamily_2-d29dd71045a811e59aff6776bf484396/ma-61042-big-Data.db')
to candidates
ERROR [metrics-graphite-reporter-1-thread-1] 2016-05-23 18:31:21,207 LogFile.java:173 - Unexpected
files detected for sstable [ma-91034-big], record [REMOVE:[/data/cassandra/data/test_keyspace/test_columnfamily-3996ce80b7ac11e48a9b6776bf484396/ma-91034-big,1463992176000,8][457420186]]:
last update time [00:00:00] should have been [08:29:36]
ERROR [metrics-graphite-reporter-1-thread-1] 2016-05-23 18:31:21,208 ScheduledReporter.java:119
- RuntimeException thrown from GraphiteReporter#report. Exception was suppressed.
java.lang.RuntimeException: Failed to list files in /data/cassandra/data/test_keyspace/test_columnfamily-3996ce80b7ac11e48a9b6776bf484396
	at org.apache.cassandra.db.lifecycle.LogAwareFileLister.list(LogAwareFileLister.java:57)
~[apache-cassandra-3.0.6.jar:3.0.6]
	at org.apache.cassandra.db.lifecycle.LifecycleTransaction.getFiles(LifecycleTransaction.java:547)
~[apache-cassandra-3.0.6.jar:3.0.6]
	at org.apache.cassandra.db.Directories$SSTableLister.filter(Directories.java:691) ~[apache-cassandra-3.0.6.jar:3.0.6]
	at org.apache.cassandra.db.Directories$SSTableLister.listFiles(Directories.java:662) ~[apache-cassandra-3.0.6.jar:3.0.6]
	at org.apache.cassandra.db.Directories$TrueFilesSizeVisitor.<init>(Directories.java:981)
~[apache-cassandra-3.0.6.jar:3.0.6]
	at org.apache.cassandra.db.Directories.getTrueAllocatedSizeIn(Directories.java:893) ~[apache-cassandra-3.0.6.jar:3.0.6]
	at org.apache.cassandra.db.Directories.trueSnapshotsSize(Directories.java:883) ~[apache-cassandra-3.0.6.jar:3.0.6]
	at org.apache.cassandra.db.ColumnFamilyStore.trueSnapshotsSize(ColumnFamilyStore.java:2332)
~[apache-cassandra-3.0.6.jar:3.0.6]
	at org.apache.cassandra.metrics.TableMetrics$32.getValue(TableMetrics.java:637) ~[apache-cassandra-3.0.6.jar:3.0.6]
	at org.apache.cassandra.metrics.TableMetrics$32.getValue(TableMetrics.java:634) ~[apache-cassandra-3.0.6.jar:3.0.6]
	at org.apache.cassandra.metrics.TableMetrics$33.getValue(TableMetrics.java:692) ~[apache-cassandra-3.0.6.jar:3.0.6]
	at org.apache.cassandra.metrics.TableMetrics$33.getValue(TableMetrics.java:686) ~[apache-cassandra-3.0.6.jar:3.0.6]
	at com.codahale.metrics.graphite.GraphiteReporter.reportGauge(GraphiteReporter.java:281)
~[metrics-graphite-3.1.0.jar:3.1.0]
	at com.codahale.metrics.graphite.GraphiteReporter.report(GraphiteReporter.java:158) ~[metrics-graphite-3.1.0.jar:3.1.0]
	at com.codahale.metrics.ScheduledReporter.report(ScheduledReporter.java:162) ~[metrics-core-3.1.0.jar:3.1.0]
	at com.codahale.metrics.ScheduledReporter$1.run(ScheduledReporter.java:117) ~[metrics-core-3.1.0.jar:3.1.0]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_91]
	at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) [na:1.8.0_91]
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
[na:1.8.0_91]
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
[na:1.8.0_91]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_91]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_91]
	at java.lang.Thread.run(Thread.java:745) [na:1.8.0_91]
Caused by: org.apache.cassandra.db.lifecycle.LogTransaction$CorruptTransactionLogException:
Some records failed verification. See earlier in log for details.
	at org.apache.cassandra.db.lifecycle.LogAwareFileLister.readTxnLog(LogAwareFileLister.java:114)
~[apache-cassandra-3.0.6.jar:3.0.6]
	at org.apache.cassandra.db.lifecycle.LogAwareFileLister.classifyFiles(LogAwareFileLister.java:106)
~[apache-cassandra-3.0.6.jar:3.0.6]
	at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184) ~[na:1.8.0_91]
	at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175) ~[na:1.8.0_91]
	at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1374) ~[na:1.8.0_91]
	at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) ~[na:1.8.0_91]
	at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471) ~[na:1.8.0_91]
	at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151) ~[na:1.8.0_91]
	at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174) ~[na:1.8.0_91]
	at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) ~[na:1.8.0_91]
	at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418) ~[na:1.8.0_91]
	at org.apache.cassandra.db.lifecycle.LogAwareFileLister.innerList(LogAwareFileLister.java:75)
~[apache-cassandra-3.0.6.jar:3.0.6]
	at org.apache.cassandra.db.lifecycle.LogAwareFileLister.list(LogAwareFileLister.java:53)
~[apache-cassandra-3.0.6.jar:3.0.6]
	... 22 common frames omitted
INFO  [CompactionExecutor:2005] 2016-05-23 18:32:27,284 LeveledManifest.java:437 - Adding
high-level (L3) BigTableReader(path='/data/cassandra/data/test_keyspace/test_columnfamily_3-37a9cb90b7ac11e48a9b6776bf484396/ma-98189-big-Data.db')
to candidates
INFO  [CompactionExecutor:2005] 2016-05-23 18:34:42,562 LeveledManifest.java:437 - Adding
high-level (L3) BigTableReader(path='/data/cassandra/data/test_keyspace/test_columnfamily_3-37a9cb90b7ac11e48a9b6776bf484396/ma-98210-big-Data.db')
to candidates
{code}

The error of interest is:

{code}
ERROR [metrics-graphite-reporter-1-thread-1] 2016-05-23 18:31:21,207 LogFile.java:173 - Unexpected
files detected for sstable [ma-91034-big], record [REMOVE:[/data/cassandra/data/test_keyspace/test_columnfamily-3996ce80b7ac11e48a9b6776bf484396/ma-91034-big,1463992176000,8][457420186]]:
last update time [00:00:00] should have been [08:29:36]
{code}

The problem is this code here in LogRecord:

{code}
    public LogRecord withExistingFiles()
    {
        return make(type, getExistingFiles(), 0, absolutePath.get());
    }

    public static LogRecord make(Type type, List<File> files, int minFiles, String absolutePath)
    {
        long lastModified = files.stream().map(File::lastModified).reduce(0L, Long::max);
        return new LogRecord(type, absolutePath, lastModified, Math.max(minFiles, files.size()));
    }
{code}

If the sstable files are completely deleted after getExistingFiles() has returned them and
before lastModified is calculated, then we may report last update time as zero because the
files no longer exist and still indicate that there are files on disk so that this code in
LogFile.verifyRecord fails:

{code}
        record.status.onDiskRecord = record.withExistingFiles();
        if (record.updateTime != record.status.onDiskRecord.updateTime && record.status.onDiskRecord.numFiles
> 0)
        {
            record.setError(String.format("Unexpected files detected for sstable [%s], " +
                                          "record [%s]: last update time [%tT] should have
been [%tT]",
                                          record.fileName(),
                                          record,
                                          record.status.onDiskRecord.updateTime,
                                          record.updateTime));

        }
{code}

As a consequence, we throw an exception when listing files. In this case we were calculating
the true snapshot size for the metrics.

Note that we are careful to delete files from oldest to newest so that the last update time
should always match what is stored in the txn log, however if all files are deleted then we
report last update time as zero. So, if my analysis is correct, this problem should not be
very frequent.

  was:
The following exception was reported in CASSANDRA-11470. It occurred whilst listing files
with compaction in progress:

{code}
WARN  [CompactionExecutor:2006] 2016-05-23 18:23:31,694 BigTableWriter.java:171 - Writing
large partition test_keyspace/test_columnfamily:eda6b9c36f8df6fe596492c3438d7a38e9b109a6 (123663388
bytes)
INFO  [IndexSummaryManager:1] 2016-05-23 18:24:23,731 IndexSummaryRedistribution.java:74 -
Redistributing index summaries
WARN  [CompactionExecutor:2006] 2016-05-23 18:24:56,669 BigTableWriter.java:171 - Writing
large partition test_keyspace/test_columnfamily:05b6b424194dd19ab7cfbcd53c4979768cd859e9 (256286063
bytes)
WARN  [CompactionExecutor:2006] 2016-05-23 18:26:23,575 BigTableWriter.java:171 - Writing
large partition test_keyspace/test_columnfamily:04e9fac15552b9ae77c27a6cb8d0fd11fdcc24d7 (212445557
bytes)
INFO  [CompactionExecutor:2005] 2016-05-23 18:29:26,839 LeveledManifest.java:437 - Adding
high-level (L3) BigTableReader(path='/data/cassandra/data/test_keyspace/test_columnfamily_2-d29dd71045a811e59aff6776bf484396/ma-61041-big-Data.db')
to candidates
WARN  [CompactionExecutor:2006] 2016-05-23 18:30:34,154 BigTableWriter.java:171 - Writing
large partition test_keyspace/test_columnfamily:edbe6f178503be90911dbf29a55b97a4b095a9ec (183852539
bytes)
INFO  [CompactionExecutor:2006] 2016-05-23 18:31:21,080 LeveledManifest.java:437 - Adding
high-level (L3) BigTableReader(path='/data/cassandra/data/test_keyspace/test_columnfamily_2-d29dd71045a811e59aff6776bf484396/ma-61042-big-Data.db')
to candidates
ERROR [metrics-graphite-reporter-1-thread-1] 2016-05-23 18:31:21,207 LogFile.java:173 - Unexpected
files detected for sstable [ma-91034-big], record [REMOVE:[/data/cassandra/data/test_keyspace/test_columnfamily-3996ce80b7ac11e48a9b6776bf484396/ma-91034-big,1463992176000,8][457420186]]:
last update time [00:00:00] should have been [08:29:36]
ERROR [metrics-graphite-reporter-1-thread-1] 2016-05-23 18:31:21,208 ScheduledReporter.java:119
- RuntimeException thrown from GraphiteReporter#report. Exception was suppressed.
java.lang.RuntimeException: Failed to list files in /data/cassandra/data/test_keyspace/test_columnfamily-3996ce80b7ac11e48a9b6776bf484396
	at org.apache.cassandra.db.lifecycle.LogAwareFileLister.list(LogAwareFileLister.java:57)
~[apache-cassandra-3.0.6.jar:3.0.6]
	at org.apache.cassandra.db.lifecycle.LifecycleTransaction.getFiles(LifecycleTransaction.java:547)
~[apache-cassandra-3.0.6.jar:3.0.6]
	at org.apache.cassandra.db.Directories$SSTableLister.filter(Directories.java:691) ~[apache-cassandra-3.0.6.jar:3.0.6]
	at org.apache.cassandra.db.Directories$SSTableLister.listFiles(Directories.java:662) ~[apache-cassandra-3.0.6.jar:3.0.6]
	at org.apache.cassandra.db.Directories$TrueFilesSizeVisitor.<init>(Directories.java:981)
~[apache-cassandra-3.0.6.jar:3.0.6]
	at org.apache.cassandra.db.Directories.getTrueAllocatedSizeIn(Directories.java:893) ~[apache-cassandra-3.0.6.jar:3.0.6]
	at org.apache.cassandra.db.Directories.trueSnapshotsSize(Directories.java:883) ~[apache-cassandra-3.0.6.jar:3.0.6]
	at org.apache.cassandra.db.ColumnFamilyStore.trueSnapshotsSize(ColumnFamilyStore.java:2332)
~[apache-cassandra-3.0.6.jar:3.0.6]
	at org.apache.cassandra.metrics.TableMetrics$32.getValue(TableMetrics.java:637) ~[apache-cassandra-3.0.6.jar:3.0.6]
	at org.apache.cassandra.metrics.TableMetrics$32.getValue(TableMetrics.java:634) ~[apache-cassandra-3.0.6.jar:3.0.6]
	at org.apache.cassandra.metrics.TableMetrics$33.getValue(TableMetrics.java:692) ~[apache-cassandra-3.0.6.jar:3.0.6]
	at org.apache.cassandra.metrics.TableMetrics$33.getValue(TableMetrics.java:686) ~[apache-cassandra-3.0.6.jar:3.0.6]
	at com.codahale.metrics.graphite.GraphiteReporter.reportGauge(GraphiteReporter.java:281)
~[metrics-graphite-3.1.0.jar:3.1.0]
	at com.codahale.metrics.graphite.GraphiteReporter.report(GraphiteReporter.java:158) ~[metrics-graphite-3.1.0.jar:3.1.0]
	at com.codahale.metrics.ScheduledReporter.report(ScheduledReporter.java:162) ~[metrics-core-3.1.0.jar:3.1.0]
	at com.codahale.metrics.ScheduledReporter$1.run(ScheduledReporter.java:117) ~[metrics-core-3.1.0.jar:3.1.0]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_91]
	at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) [na:1.8.0_91]
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
[na:1.8.0_91]
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
[na:1.8.0_91]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_91]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_91]
	at java.lang.Thread.run(Thread.java:745) [na:1.8.0_91]
Caused by: org.apache.cassandra.db.lifecycle.LogTransaction$CorruptTransactionLogException:
Some records failed verification. See earlier in log for details.
	at org.apache.cassandra.db.lifecycle.LogAwareFileLister.readTxnLog(LogAwareFileLister.java:114)
~[apache-cassandra-3.0.6.jar:3.0.6]
	at org.apache.cassandra.db.lifecycle.LogAwareFileLister.classifyFiles(LogAwareFileLister.java:106)
~[apache-cassandra-3.0.6.jar:3.0.6]
	at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184) ~[na:1.8.0_91]
	at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175) ~[na:1.8.0_91]
	at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1374) ~[na:1.8.0_91]
	at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) ~[na:1.8.0_91]
	at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471) ~[na:1.8.0_91]
	at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151) ~[na:1.8.0_91]
	at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174) ~[na:1.8.0_91]
	at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) ~[na:1.8.0_91]
	at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418) ~[na:1.8.0_91]
	at org.apache.cassandra.db.lifecycle.LogAwareFileLister.innerList(LogAwareFileLister.java:75)
~[apache-cassandra-3.0.6.jar:3.0.6]
	at org.apache.cassandra.db.lifecycle.LogAwareFileLister.list(LogAwareFileLister.java:53)
~[apache-cassandra-3.0.6.jar:3.0.6]
	... 22 common frames omitted
INFO  [CompactionExecutor:2005] 2016-05-23 18:32:27,284 LeveledManifest.java:437 - Adding
high-level (L3) BigTableReader(path='/data/cassandra/data/test_keyspace/test_columnfamily_3-37a9cb90b7ac11e48a9b6776bf484396/ma-98189-big-Data.db')
to candidates
INFO  [CompactionExecutor:2005] 2016-05-23 18:34:42,562 LeveledManifest.java:437 - Adding
high-level (L3) BigTableReader(path='/data/cassandra/data/test_keyspace/test_columnfamily_3-37a9cb90b7ac11e48a9b6776bf484396/ma-98210-big-Data.db')
to candidates
{code}

The error of interest is:

{code}
ERROR [metrics-graphite-reporter-1-thread-1] 2016-05-23 18:31:21,207 LogFile.java:173 - Unexpected
files detected for sstable [ma-91034-big], record [REMOVE:[/data/cassandra/data/test_keyspace/test_columnfamily-3996ce80b7ac11e48a9b6776bf484396/ma-91034-big,1463992176000,8][457420186]]:
last update time [00:00:00] should have been [08:29:36]
{code}

The problem is this code here in LogRecord:

{code}
 public LogRecord withExistingFiles()
    {
        return make(type, getExistingFiles(), 0, absolutePath.get());
    }

    public static LogRecord make(Type type, List<File> files, int minFiles, String absolutePath)
    {
        long lastModified = files.stream().map(File::lastModified).reduce(0L, Long::max);
        return new LogRecord(type, absolutePath, lastModified, Math.max(minFiles, files.size()));
    }
{code}

If the sstable files are completely deleted after getExistingFiles() has returned them and
before lastModified is calculated, then we may report last update time as zero because the
files no longer exist and still indicate that there are files on disk so that this code in
LogFile.verifyRecord fails:

{code}
record.status.onDiskRecord = record.withExistingFiles();
        if (record.updateTime != record.status.onDiskRecord.updateTime && record.status.onDiskRecord.numFiles
> 0)
        {
            record.setError(String.format("Unexpected files detected for sstable [%s], " +
                                          "record [%s]: last update time [%tT] should have
been [%tT]",
                                          record.fileName(),
                                          record,
                                          record.status.onDiskRecord.updateTime,
                                          record.updateTime));

        }
{code}

As a consequence, we throw an exception when listing files. In this case we were calculating
the true snapshot size for the metrics.

Note that we are careful to delete files from oldest to newest so that the last update time
should always match what is stored in the txn log, however if all files are deleted then we
report last update time as zero.


> LogRecord: file system race condition may cause verify() to fail
> ----------------------------------------------------------------
>
>                 Key: CASSANDRA-11889
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11889
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Local Write-Read Paths
>            Reporter: Stefania
>            Assignee: Stefania
>             Fix For: 3.0.x, 3.x
>
>
> The following exception was reported in CASSANDRA-11470. It occurred whilst listing files
with compaction in progress:
> {code}
> WARN  [CompactionExecutor:2006] 2016-05-23 18:23:31,694 BigTableWriter.java:171 - Writing
large partition test_keyspace/test_columnfamily:eda6b9c36f8df6fe596492c3438d7a38e9b109a6 (123663388
bytes)
> INFO  [IndexSummaryManager:1] 2016-05-23 18:24:23,731 IndexSummaryRedistribution.java:74
- Redistributing index summaries
> WARN  [CompactionExecutor:2006] 2016-05-23 18:24:56,669 BigTableWriter.java:171 - Writing
large partition test_keyspace/test_columnfamily:05b6b424194dd19ab7cfbcd53c4979768cd859e9 (256286063
bytes)
> WARN  [CompactionExecutor:2006] 2016-05-23 18:26:23,575 BigTableWriter.java:171 - Writing
large partition test_keyspace/test_columnfamily:04e9fac15552b9ae77c27a6cb8d0fd11fdcc24d7 (212445557
bytes)
> INFO  [CompactionExecutor:2005] 2016-05-23 18:29:26,839 LeveledManifest.java:437 - Adding
high-level (L3) BigTableReader(path='/data/cassandra/data/test_keyspace/test_columnfamily_2-d29dd71045a811e59aff6776bf484396/ma-61041-big-Data.db')
to candidates
> WARN  [CompactionExecutor:2006] 2016-05-23 18:30:34,154 BigTableWriter.java:171 - Writing
large partition test_keyspace/test_columnfamily:edbe6f178503be90911dbf29a55b97a4b095a9ec (183852539
bytes)
> INFO  [CompactionExecutor:2006] 2016-05-23 18:31:21,080 LeveledManifest.java:437 - Adding
high-level (L3) BigTableReader(path='/data/cassandra/data/test_keyspace/test_columnfamily_2-d29dd71045a811e59aff6776bf484396/ma-61042-big-Data.db')
to candidates
> ERROR [metrics-graphite-reporter-1-thread-1] 2016-05-23 18:31:21,207 LogFile.java:173
- Unexpected files detected for sstable [ma-91034-big], record [REMOVE:[/data/cassandra/data/test_keyspace/test_columnfamily-3996ce80b7ac11e48a9b6776bf484396/ma-91034-big,1463992176000,8][457420186]]:
last update time [00:00:00] should have been [08:29:36]
> ERROR [metrics-graphite-reporter-1-thread-1] 2016-05-23 18:31:21,208 ScheduledReporter.java:119
- RuntimeException thrown from GraphiteReporter#report. Exception was suppressed.
> java.lang.RuntimeException: Failed to list files in /data/cassandra/data/test_keyspace/test_columnfamily-3996ce80b7ac11e48a9b6776bf484396
> 	at org.apache.cassandra.db.lifecycle.LogAwareFileLister.list(LogAwareFileLister.java:57)
~[apache-cassandra-3.0.6.jar:3.0.6]
> 	at org.apache.cassandra.db.lifecycle.LifecycleTransaction.getFiles(LifecycleTransaction.java:547)
~[apache-cassandra-3.0.6.jar:3.0.6]
> 	at org.apache.cassandra.db.Directories$SSTableLister.filter(Directories.java:691) ~[apache-cassandra-3.0.6.jar:3.0.6]
> 	at org.apache.cassandra.db.Directories$SSTableLister.listFiles(Directories.java:662)
~[apache-cassandra-3.0.6.jar:3.0.6]
> 	at org.apache.cassandra.db.Directories$TrueFilesSizeVisitor.<init>(Directories.java:981)
~[apache-cassandra-3.0.6.jar:3.0.6]
> 	at org.apache.cassandra.db.Directories.getTrueAllocatedSizeIn(Directories.java:893)
~[apache-cassandra-3.0.6.jar:3.0.6]
> 	at org.apache.cassandra.db.Directories.trueSnapshotsSize(Directories.java:883) ~[apache-cassandra-3.0.6.jar:3.0.6]
> 	at org.apache.cassandra.db.ColumnFamilyStore.trueSnapshotsSize(ColumnFamilyStore.java:2332)
~[apache-cassandra-3.0.6.jar:3.0.6]
> 	at org.apache.cassandra.metrics.TableMetrics$32.getValue(TableMetrics.java:637) ~[apache-cassandra-3.0.6.jar:3.0.6]
> 	at org.apache.cassandra.metrics.TableMetrics$32.getValue(TableMetrics.java:634) ~[apache-cassandra-3.0.6.jar:3.0.6]
> 	at org.apache.cassandra.metrics.TableMetrics$33.getValue(TableMetrics.java:692) ~[apache-cassandra-3.0.6.jar:3.0.6]
> 	at org.apache.cassandra.metrics.TableMetrics$33.getValue(TableMetrics.java:686) ~[apache-cassandra-3.0.6.jar:3.0.6]
> 	at com.codahale.metrics.graphite.GraphiteReporter.reportGauge(GraphiteReporter.java:281)
~[metrics-graphite-3.1.0.jar:3.1.0]
> 	at com.codahale.metrics.graphite.GraphiteReporter.report(GraphiteReporter.java:158)
~[metrics-graphite-3.1.0.jar:3.1.0]
> 	at com.codahale.metrics.ScheduledReporter.report(ScheduledReporter.java:162) ~[metrics-core-3.1.0.jar:3.1.0]
> 	at com.codahale.metrics.ScheduledReporter$1.run(ScheduledReporter.java:117) ~[metrics-core-3.1.0.jar:3.1.0]
> 	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_91]
> 	at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) [na:1.8.0_91]
> 	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
[na:1.8.0_91]
> 	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
[na:1.8.0_91]
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_91]
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_91]
> 	at java.lang.Thread.run(Thread.java:745) [na:1.8.0_91]
> Caused by: org.apache.cassandra.db.lifecycle.LogTransaction$CorruptTransactionLogException:
Some records failed verification. See earlier in log for details.
> 	at org.apache.cassandra.db.lifecycle.LogAwareFileLister.readTxnLog(LogAwareFileLister.java:114)
~[apache-cassandra-3.0.6.jar:3.0.6]
> 	at org.apache.cassandra.db.lifecycle.LogAwareFileLister.classifyFiles(LogAwareFileLister.java:106)
~[apache-cassandra-3.0.6.jar:3.0.6]
> 	at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184) ~[na:1.8.0_91]
> 	at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175) ~[na:1.8.0_91]
> 	at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1374) ~[na:1.8.0_91]
> 	at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) ~[na:1.8.0_91]
> 	at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471) ~[na:1.8.0_91]
> 	at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151) ~[na:1.8.0_91]
> 	at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
~[na:1.8.0_91]
> 	at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) ~[na:1.8.0_91]
> 	at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418) ~[na:1.8.0_91]
> 	at org.apache.cassandra.db.lifecycle.LogAwareFileLister.innerList(LogAwareFileLister.java:75)
~[apache-cassandra-3.0.6.jar:3.0.6]
> 	at org.apache.cassandra.db.lifecycle.LogAwareFileLister.list(LogAwareFileLister.java:53)
~[apache-cassandra-3.0.6.jar:3.0.6]
> 	... 22 common frames omitted
> INFO  [CompactionExecutor:2005] 2016-05-23 18:32:27,284 LeveledManifest.java:437 - Adding
high-level (L3) BigTableReader(path='/data/cassandra/data/test_keyspace/test_columnfamily_3-37a9cb90b7ac11e48a9b6776bf484396/ma-98189-big-Data.db')
to candidates
> INFO  [CompactionExecutor:2005] 2016-05-23 18:34:42,562 LeveledManifest.java:437 - Adding
high-level (L3) BigTableReader(path='/data/cassandra/data/test_keyspace/test_columnfamily_3-37a9cb90b7ac11e48a9b6776bf484396/ma-98210-big-Data.db')
to candidates
> {code}
> The error of interest is:
> {code}
> ERROR [metrics-graphite-reporter-1-thread-1] 2016-05-23 18:31:21,207 LogFile.java:173
- Unexpected files detected for sstable [ma-91034-big], record [REMOVE:[/data/cassandra/data/test_keyspace/test_columnfamily-3996ce80b7ac11e48a9b6776bf484396/ma-91034-big,1463992176000,8][457420186]]:
last update time [00:00:00] should have been [08:29:36]
> {code}
> The problem is this code here in LogRecord:
> {code}
>     public LogRecord withExistingFiles()
>     {
>         return make(type, getExistingFiles(), 0, absolutePath.get());
>     }
>     public static LogRecord make(Type type, List<File> files, int minFiles, String
absolutePath)
>     {
>         long lastModified = files.stream().map(File::lastModified).reduce(0L, Long::max);
>         return new LogRecord(type, absolutePath, lastModified, Math.max(minFiles, files.size()));
>     }
> {code}
> If the sstable files are completely deleted after getExistingFiles() has returned them
and before lastModified is calculated, then we may report last update time as zero because
the files no longer exist and still indicate that there are files on disk so that this code
in LogFile.verifyRecord fails:
> {code}
>         record.status.onDiskRecord = record.withExistingFiles();
>         if (record.updateTime != record.status.onDiskRecord.updateTime && record.status.onDiskRecord.numFiles
> 0)
>         {
>             record.setError(String.format("Unexpected files detected for sstable [%s],
" +
>                                           "record [%s]: last update time [%tT] should
have been [%tT]",
>                                           record.fileName(),
>                                           record,
>                                           record.status.onDiskRecord.updateTime,
>                                           record.updateTime));
>         }
> {code}
> As a consequence, we throw an exception when listing files. In this case we were calculating
the true snapshot size for the metrics.
> Note that we are careful to delete files from oldest to newest so that the last update
time should always match what is stored in the txn log, however if all files are deleted then
we report last update time as zero. So, if my analysis is correct, this problem should not
be very frequent.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message