ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ivan Rakov <ivan.glu...@gmail.com>
Subject Re: Reconsider default WAL mode: we need something between LOG_ONLY and FSYNC
Date Fri, 13 Apr 2018 08:34:13 GMT
Agree with Alex.

Now we perform extra WAL fsync() at the beginning of checkpoint. We 
*have* to wait for call completion before starting to write checkpoint 
pages - otherwise both physical records in WAL and partition files in 
storage will be in a mess in case of power loss. User threads *don't* 
directly wait for this fsync(), however total throughput of user threads 
can't exceed total throughput of checkpoint, that's why total throughput 
of user threads is decreased.

Denis, regarding this:

> Could we run Yardstick or YCSB benchmarks to see how the fixed LOG_ONLY
> affected the performance under the operational load (after the preloading
> part you're referring to is over)?

Please take a look at benchmark results attached to 
https://issues.apache.org/jira/browse/IGNITE-7754 ticket - "put" 
benchmarks represent data loading, and "put-get" benchmarks represent 
operational load. As you can see, operational load degradation is 4-5 
times lesser that in data load case.

Best Regards,
Ivan Rakov

On 13.04.2018 11:24, Alexey Goncharuk wrote:
> Dmitriy,
> The point of this fsync is to order FS disk writes to prevent data
> corruption, so this fsync has to be synchronous and cannot be asynchronous
> or delayed.
> Given that we fix correctness, I believe that current results are
> acceptable.
> 2018-04-13 2:48 GMT+03:00 Dmitriy Setrakyan <dsetrakyan@apache.org>:
>> On Thu, Apr 12, 2018 at 9:45 AM, Ivan Rakov <ivan.glukos@gmail.com> wrote:
>>> Dmitriy,
>>> fsync() is really slow operation - it's the main reason why FSYNC mode is
>>> way slower than LOG_ONLY.
>>> Fix includes extra fsyncs in necessary parts of code and nothing more.
>>> Every part is important - at the beginning of the thread I described why.
>>> 20% slow in benchmark doesn't mean than Ignite itself will become 20%
>>> slower. Benchmark replays only "data loading" scenario. It signals that
>>> maximum throughput with WAL enabled will be 20% slower. By the way, we
>>> already have option to disable WAL in runtime for the period of data
>>> loading.
>> Ivan, I get it, but I am sure that you can do more things in parallel. Do
>> we wait for the fsync call to complete? If yes, do we have to wait? Are
>> there other performance optimizations you can add, considering that we are
>> in LOG_ONLY or BACKGROUND modes and disk writes may be delayed.
>> D.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message