cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ariel Weisberg (JIRA)" <>
Subject [jira] [Created] (CASSANDRA-14383) If fsync fails it's always an issue and continuing execution is suspect
Date Fri, 13 Apr 2018 03:24:00 GMT
Ariel Weisberg created CASSANDRA-14383:

             Summary: If fsync fails it's always an issue and continuing execution is suspect
                 Key: CASSANDRA-14383
             Project: Cassandra
          Issue Type: Bug
          Components: Core
            Reporter: Ariel Weisberg
            Assignee: Ariel Weisberg
             Fix For: 4.0.x, 2.1.x, 3.0.x, 3.11.x

We can't catch fsync errors and continue so we shouldn't have code that does that in C*. There
was a Postgres bug where fsync returned an error and the FS lost data, but subsequent fsyncs

The [LastErrorException code in NativeLibrary.trySync|]
looks a little janky. What's up with that? When would trySync be something we would merely
try? If try is good enough why do it at all considering try is the default behavior of a series
of unsynced filesystem operations.

Also when we fsync in FD it's not just fsyncing that file the FS is potentially fsyncing other
data and the error code we get could be related to that other data so we can't safely ignore
it. The filesystem could be internally inconsistent as well. This happens because the FS journaling
may force the FS to flush other data as well to preserve the ordering requirements of journaled

If we ignore fsync errors it needs to be for whitelisted reasons such as a bad FD.

I know we have FSErrorHandler and it makes sense for reads, but I'm not sold on it being the
right answer for writes. We don't retry flushing a memtable or writing to the commit log to
my knowledge. We could go read only and I need to check if that is w

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message