cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ariel Weisberg (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-14383) If fsync fails it's always an issue and continuing execution is suspect
Date Fri, 13 Apr 2018 16:34:00 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-14383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16437541#comment-16437541
] 

Ariel Weisberg commented on CASSANDRA-14383:
--------------------------------------------

There is one usage of trySync on a directory in SequentialWriter which we use for data we
care about. There is also usage in LogReplica. There are sync calls that go to NativeLibrary
instead of SyncUtil and we shouldn't be doing that.

Generally things are a little confused with some duplication that could be cleaned up. There
is a helper for opening the directory and getting an FD on it so other code shouldn't be doing
it directly.

The bigger question of what we should do when fsync generates an error I am still undecided
on.

> If fsync fails it's always an issue and continuing execution is suspect
> -----------------------------------------------------------------------
>
>                 Key: CASSANDRA-14383
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-14383
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Ariel Weisberg
>            Assignee: Ariel Weisberg
>            Priority: Major
>             Fix For: 2.1.x, 3.0.x, 3.11.x, 4.0.x
>
>
> We can't catch fsync errors and continue so we shouldn't have code that does that in
C*. There was a Postgres bug where fsync returned an error and the FS lost data, but subsequent
fsyncs succeeded.
> The [LastErrorException code in NativeLibrary.trySync|https://github.com/apache/cassandra/commit/be313935e54be450d9aaabda7965a2f266e922c9#diff-4258621cdf765f0fea6770db5d40038fR307]
looks a little janky. What's up with that? When would trySync be something we would merely
try? If try is good enough why do it at all considering try is the default behavior of a series
of unsynced filesystem operations.
> -Also when we fsync in FD it's not just fsyncing that file the FS is potentially fsyncing
other data and the error code we get could be related to that other data so we can't safely
ignore it. The filesystem could be internally inconsistent as well. This happens because the
FS journaling may force the FS to flush other data as well to preserve the ordering requirements
of journaled metadata.- I'm actually not 100% sure when/if this is the case.
> If we ignore fsync errors it needs to be for whitelisted reasons such as a bad FD.
> I know we have FSErrorHandler and it makes sense for reads, but I'm not sold on it being
the right answer for writes. We don't retry flushing a memtable or writing to the commit log
to my knowledge. We could go read only and I need to check if that is what we do in practice.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org


Mime
View raw message