flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sihua Zhou (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-8922) Revert FLINK-8859 because it causes segfaults in testing
Date Tue, 13 Mar 2018 11:19:00 GMT

    [ https://issues.apache.org/jira/browse/FLINK-8922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16396836#comment-16396836

Sihua Zhou commented on FLINK-8922:

After having some tried (bumping {{rocksdbjni}} to higher version, using {{WriteOptions}}
via {{WriteBatch}}, rewriting the recovery code to exclude possible causes), I still can't
figure out why deactivate WAL can lead to segfault, maybe it's really a bug in RocksDB (but
I can't find any evidence either)...

Next step I want to have a deeper look at RocksDB itself, and here I want to add some comments
about the cost problem that [~StephanEwen] mentioned.  I think even though we can't deactivate
WAL on restore (the worst case ), we still could get a much better performance then currently
via using {{WriteBatch}}. Fortunately, {{WriteBatch}} is not very sensitive to WAL. Even when
WAL is enabled, it's still get a better performance than using {{put()}} but deactivate WAL.
Here I paste some statistics (base on my mac).
--> put with disableWAL=true VS put with disableWAL=false <--
number:1000 put cost:6 ms
number:1000 put cost:17 ms
number:10000 put cost:48 ms
number:10000 put cost:106 ms
number:100000 put cost:857 ms
number:100000 put cost:1871 ms
number:1000000 put cost:3654 ms
number:1000000 put cost:9416 ms
--> put with disableWAL=true VS write batch with disableWAL=false <--
number:1000 put cost:4 ms
number:1000 write batch cost:5 ms
number:10000 put cost:41 ms
number:10000 write batch cost:25 ms
number:100000 put cost:372 ms
number:100000 write batch cost:262 ms
number:1000000 put cost:3869 ms
number:1000000 write batch cost:2751 ms
--> write batch with disableWAL=true VS write batch disableWAL = true <--
number:1000 write batch cost:3 ms
number:1000 write batch cost:4 ms
number:10000 write batch cost:21 ms
number:10000 write batch cost:27 ms
number:100000 write batch cost:243 ms
number:100000 write batch cost:278 ms
number:1000000 write batch cost:2495 ms
number:1000000 write batch cost:2818 ms
There is already a [JIRA|https://issues.apache.org/jira/browse/FLINK-8845] and a [PR|https://github.com/apache/flink/pull/5650]
for 1.6(I'm not ask to merge it right now) that's related to this. Will feed back if I have
any advance. ;)

> Revert FLINK-8859 because it causes segfaults in testing
> --------------------------------------------------------
>                 Key: FLINK-8922
>                 URL: https://issues.apache.org/jira/browse/FLINK-8922
>             Project: Flink
>          Issue Type: Bug
>          Components: State Backends, Checkpointing
>    Affects Versions: 1.5.0
>            Reporter: Stefan Richter
>            Assignee: Stefan Richter
>            Priority: Major
>             Fix For: 1.5.0
> We need to revertĀ FLINK-8859 because it causes problems with RocksDB that make our automated
tests fail on Travis. The change looks actually good and it is currently unclear why this
can introduce such a problem. This might also be a Rocks in RocksDB. Nevertheless, for the
sake of a proper release testing, we should revert the change for now.

This message was sent by Atlassian JIRA

View raw message