ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vladimir Ozerov <voze...@gridgain.com>
Subject Re: Apache Ignite 2.4 release
Date Mon, 19 Feb 2018 07:37:09 GMT
Alex,

You get me right. DEFAULT -> LOG_ONLY doesn't introduce any dramatic
changes when comparing 2.3 to 2.4 - Ignite was unsafe out of the box in
2.3, and it is unsafe in 2.4 as well.

The very problem is that we claim ourselves to be ACID, while in reality we
are only "AI" out of the box, because durability is not guaranteed due to
zero backups and LOG_ONLY and consistency is not guaranteed due to
PRIMARY_SYNC. Neither Cassandra, nor Mongo or any others claim themselves
to be ACID, so it is not valid to refer to their defaults.

On Mon, Feb 19, 2018 at 10:06 AM, Alexey Goncharuk <
alexey.goncharuk@gmail.com> wrote:

> In terms of 'safety', Ignite default settings are far beyond optimal. For
> in-memory mode, we have 0 backups by default, which means partition loss in
> a case of node failure, we have readFromBackup=true and PRIMARY_SYNC by
> default which effectively cancels linearizability property for cache
> updates, so setting the default WAL mode to LOG_ONLY does not seem to be a
> bigger evil than it currently is. If we are to move to safer defaults, we
> should change all of the affected sides.
>
> I also want to clarify the difference between guarantees in
> non-fsync modes. We should distinguish the loss of durability (the loss of
> the last update) because the update did not make it to disk and data loss
> because the disk content was shuffled due to an incomplete page write. In
> my understanding, the current situation is:
> FSYNC: loss of durability: not possible, data loss: not possible
> LOG_ONLY: loss of durability: possible only if OS/power fails, data loss:
> possible only if OS/power fails
> BACKGROUND: loss of durability: possible if Ignite process fails, data
> loss: possible only if OS/power fails
>
> The data loss situation can be mitigated in the cluster using a large
> enough replication factor (this is what Dmitriy was describing in the case
> of LOG_ONLY and 3 backups configuration).
>
> Denis,
> I do not think it is fair to compare Ignite defaults to Cassandra's
> defaults because Cassandra is _not_ transactional _eventually consistent_
> datastore, they claim much weaker guarantees than Ignite.
>
> All in all, I'm ok to change the WAL default right now, but I would revisit
> all those settings in 3.0 and made Ignite safe-first.
>
> 2018-02-17 3:24 GMT+03:00 Denis Magda <dmagda@apache.org>:
>
> > Classic relational databases have no choice rather than to use FSYNC by
> > default. RDBMS is all about consistency.
> >
> > Distributed databases try to balance consistency and performance. For
> > instance, why to fsync every update if there is usually 1 backup copy?
> > This is probably why VoltDB [1] and Cassandra use the modes comparable to
> > Ignite's LOG_ONLY.
> >
> > Ignite as a distributed database should care of both consistency and
> > performance.
> >
> > My vote goes to FSYNC, LOG_ONLY (default), BACKGROUND, NONE.
> >
> >
> > [1] https://docs.voltdb.com/UsingVoltDB/CmdLogConfig.php
> >
> > --
> > Denis
> >
> >
> > On Fri, Feb 16, 2018 at 2:14 PM, Dmitriy Setrakyan <
> dsetrakyan@apache.org>
> > wrote:
> >
> > > Vova,
> > >
> > > I hear your concerns, but at the same time I know that one of the
> largest
> > > banks in eastern Europe is using Ignite in LOG_ONLY mode with 3 backups
> > to
> > > move money. The rational is that the probability of failure of 4
> servers
> > at
> > > hardware level at the same time is very low. However, if the JVM
> process
> > > fails on any server, then it can be safely restarted without loosing
> > data.
> > > In my view, this is why LOG_ONLY mode makes sense as a default.
> > >
> > > I still vote to change the default to LOG_ONLY, deprecate the DEFAULT
> > name
> > > altogether and add FSYNC mode instead.
> > >
> > > D.
> > >
> > > On Fri, Feb 16, 2018 at 4:05 PM, Vladimir Ozerov <vozerov@gridgain.com
> >
> > > wrote:
> > >
> > > > Sergey,
> > > >
> > > > We do not have backups by default either, so essentially we are
> loosing
> > > > data by default. Moreover, backups are less reliable option than
> fsync
> > > > because a lot of users cannot afford putting servers into separate
> > power
> > > > circuits, so a single power failure may easily lead to poweroff of
> the
> > > > whole cluster at once, so data is lost still. This is normal practice
> > > even
> > > > for enterprise deployments (e.g. asynchronous replication).
> > > >
> > > > To make things even worse, we employ PRIMARY_SYNC mode by default! So
> > > even
> > > > if you configured backups, you still may loose data due to a single
> > node
> > > > failure - just shutdown the PRIMARY after commit is confirmed to the
> > > client
> > > > and your recent update will disappers.
> > > >
> > > > So this is what user should do to make himself safe:
> > > > 1) Learn about WAL modes
> > > > 2) Learn about backups
> > > > 3) Learn about synchronization modes
> > > > 4) Cross his fingers that he understood everything correctly and that
> > > there
> > > > are no other hidden surprises in Ignite which could lead to data
> loss.
> > > >
> > > > Way to much for a product, claiming to be A*C*ID and persistent,
> don't
> > > you
> > > > think so?
> > > >
> > > > Leaving deafult WAL mode with fsync resolves all these issues.
> > > >
> > > > Vladimir.
> > > >
> > > >
> > > > On Fri, Feb 16, 2018 at 11:43 PM, Sergey Kozlov <
> skozlov@gridgain.com>
> > > > wrote:
> > > >
> > > > > I suppose some approaches used by classic databases makes no sense
> > for
> > > > > Ignite. FSYNC requirement for databases has the nature of single
> host
> > > > > solution. If you have corrupted db files you have corrupted (lost)
> > > data.
> > > > >
> > > > > For Ignite the enough number of backups and the failure detecting
> > logic
> > > > can
> > > > > provide the data consistency in term "cluster data consistency".
> > > > >
> > > > >
> > > > >
> > > > > On Fri, Feb 16, 2018 at 8:57 PM, Dmitry Pavlov <
> > dpavlov.spb@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Hi, all WAL modes except NONE protects from data consistency
> > problem
> > > > > > (B+Tree, pages, etc), which is why I suggest to avoid saying
> > > > 'corrupted'
> > > > > > about 'unapplied updates'.
> > > > > >
> > > > > > Log Only and Background may cause unapplied updates in case
of
> > > > OS/process
> > > > > > failures.
> > > > > >
> > > > > > None mode IMO is not an option in case data consistency is
> needed.
> > > > > >
> > > > > > пт, 16 февр. 2018 г. в 20:49, Valentin Kulichenko <
> > > > > > valentin.kulichenko@gmail.com>:
> > > > > >
> > > > > > > Guys,
> > > > > > >
> > > > > > > While we're on this topic, what is the difference between
> > > BACKGROUND
> > > > > and
> > > > > > > NONE in terms of semantics and provided guarantees? To
me it
> > looks
> > > > like
> > > > > > > both guarantee to recover the state since last checkpoint
and
> > > > anything
> > > > > > else
> > > > > > > can potentially be lost, so from user perspective they
are the
> > > same.
> > > > > Am I
> > > > > > > missing something here?
> > > > > > >
> > > > > > > Also there is the following in Javadoc for NONE: "If an
Ignite
> > node
> > > > is
> > > > > > > terminated in NONE mode abruptly, it is likely that the
data
> > stored
> > > > on
> > > > > > disk
> > > > > > > is corrupted and work directory will need to be cleared
for a
> > node
> > > > > > > restart.". If this is really the case, I'm not sure NONE
makes
> > > sense
> > > > at
> > > > > > > all. Why would I enable persistence if I'm likely to clear
the
> > > > storage
> > > > > on
> > > > > > > restart?
> > > > > > >
> > > > > > > -Val
> > > > > > >
> > > > > > > On Fri, Feb 16, 2018 at 8:39 AM, Vladimir Ozerov <
> > > > vozerov@gridgain.com
> > > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > What is the reason to have DEFAULT mode at all if
you claim
> > > > LOG_ONLY
> > > > > to
> > > > > > > be
> > > > > > > > completely safe? :)
> > > > > > > >
> > > > > > > > And how it could be safe provided that without fsync
we loose
> > > part
> > > > of
> > > > > > WAL
> > > > > > > > itself in case of crash?
> > > > > > > >
> > > > > > > > пт, 16 февр. 2018 г. в 19:32, Dmitry Pavlov
<
> > > dpavlov.spb@gmail.com
> > > > >:
> > > > > > > >
> > > > > > > > > Thank you. Data can't be corrupted in case crash
because of
> > WAL
> > > > > > replay
> > > > > > > > > (since completed checkpoint). Physical records
are used to
> > > > restore
> > > > > > > > probably
> > > > > > > > > corrupted pages in persistent store (we overwrite
so called
> > > 'grey
> > > > > > > zone' -
> > > > > > > > > pages we don't know for sure if they have been
written).
> > > > > > > > >
> > > > > > > > > Only one effect is unwritten one or several last
> > transactions.
> > > It
> > > > > is
> > > > > > > not
> > > > > > > > > the same with corrupted data.
> > > > > > > > >
> > > > > > > > > пт, 16 февр. 2018 г. в 19:19, Vladimir
Ozerov <
> > > > > vozerov@gridgain.com
> > > > > > >:
> > > > > > > > >
> > > > > > > > > > Log only mode is not safe - data might be
corrupted in
> case
> > > of
> > > > > > system
> > > > > > > > > > crash. Oracle - fsync, Postgres - fsync,
SQL Server -
> > fsync,
> > > > > > > Cassandra
> > > > > > > > -
> > > > > > > > > > similar to our “background”.
> > > > > > > > > >
> > > > > > > > > > пт, 16 февр. 2018 г. в 19:11, Dmitry
Pavlov <
> > > > > dpavlov.spb@gmail.com
> > > > > > >:
> > > > > > > > > >
> > > > > > > > > > > Hi Vladimir,
> > > > > > > > > > >
> > > > > > > > > > > What you saying is defenetely make
sence.
> > > > > > > > > > >
> > > > > > > > > > > In the same time LOG_ONLY is also safe
mode, user will
> be
> > > > able
> > > > > to
> > > > > > > > > restore
> > > > > > > > > > > system after crash. If it is not true,
we should create
> > > > > critical
> > > > > > > > ticket
> > > > > > > > > > and
> > > > > > > > > > > fix it.
> > > > > > > > > > >
> > > > > > > > > > > Do you know other databases defaults,
such as
> Cassandra,
> > > > > Oracle,
> > > > > > > > > Postgre?
> > > > > > > > > > >
> > > > > > > > > > > Sincerely,
> > > > > > > > > > > Dmitriy Pavlov
> > > > > > > > > > >
> > > > > > > > > > > пт, 16 февр. 2018 г. в 18:41,
Vladimir Ozerov <
> > > > > > > vozerov@gridgain.com
> > > > > > > > >:
> > > > > > > > > > >
> > > > > > > > > > > > Igniters,
> > > > > > > > > > > >
> > > > > > > > > > > > Sorry for pouring oil on the flames,
but from
> database
> > > > > > > perspective
> > > > > > > > > > moving
> > > > > > > > > > > > from FSYNC to non-FSYNC mode appears
to be a mistake.
> > > When
> > > > > you
> > > > > > > work
> > > > > > > > > > with
> > > > > > > > > > > > database, your main expectation
is that it will save
> > your
> > > > > data.
> > > > > > > All
> > > > > > > > > > > > production database vendor make
sure that you are
> safe,
> > > not
> > > > > > that
> > > > > > > > you
> > > > > > > > > > are
> > > > > > > > > > > > fast. Moreover, some vendors even
prevent you from
> > being
> > > in
> > > > > > > unsafe
> > > > > > > > > mode
> > > > > > > > > > > > (e.g. you cannot disable fsync
in SQL Server at all).
> > > > > > > > > > > >
> > > > > > > > > > > > If we continue going in this direction,
we will end
> up
> > > > with a
> > > > > > > > > product,
> > > > > > > > > > > > which is unsafe out of the box
and require tons of
> > > > > > documentation
> > > > > > > to
> > > > > > > > > > > > understand how to make it safe.
Definitely not the
> > right
> > > > > > message
> > > > > > > to
> > > > > > > > > the
> > > > > > > > > > > > market. This is like a car without
brakes - would you
> > > like
> > > > to
> > > > > > > drive
> > > > > > > > > it?
> > > > > > > > > > > If
> > > > > > > > > > > > this is Need For Speed game and
you have unlimited
> > lives
> > > > > > > (in-memory
> > > > > > > > > > cache
> > > > > > > > > > > > with backing store), then yes.
If this is a real life
> > > with
> > > > > > > > > > (persistence)
> > > > > > > > > > > -
> > > > > > > > > > > > then no.
> > > > > > > > > > > >
> > > > > > > > > > > > On Fri, Feb 16, 2018 at 5:20 PM,
Dmitriy Setrakyan <
> > > > > > > > > > > dsetrakyan@apache.org>
> > > > > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > Well, I cannot say that I
like the name LOG_ONLY,
> > but I
> > > > > would
> > > > > > > > vote
> > > > > > > > > to
> > > > > > > > > > > > keep
> > > > > > > > > > > > > it for now, given that it
is already documented in
> > many
> > > > > > places,
> > > > > > > > > > blogs,
> > > > > > > > > > > > and
> > > > > > > > > > > > > examples.
> > > > > > > > > > > > >
> > > > > > > > > > > > > D.
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Fri, Feb 16, 2018 at 8:13
AM, Ivan Rakov <
> > > > > > > > ivan.glukos@gmail.com
> > > > > > > > > >
> > > > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > Looks like it's an Ignite
term - I've never heard
> > of
> > > it
> > > > > > > outside
> > > > > > > > > > > Ignite
> > > > > > > > > > > > > > scope.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Though, renaming existing
enum value requires
> > keeping
> > > > old
> > > > > > as
> > > > > > > > > > > > deprecated.
> > > > > > > > > > > > > > DEFAULT is confusing
enough to pay this price.
> > > > > > > > > > > > > > As for LOG_ONLY, I think
we can keep it as long
> as
> > it
> > > > has
> > > > > > > good
> > > > > > > > > and
> > > > > > > > > > > > > > definitive javadoc.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Best Regards,
> > > > > > > > > > > > > > Ivan Rakov
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On 16.02.2018 17:07,
Dmitriy Setrakyan wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >> Igniters, just to
clarify, does the term
> LOG_ONLY
> > > mean
> > > > > > > > anything
> > > > > > > > > in
> > > > > > > > > > > the
> > > > > > > > > > > > > >> industry or is this
just an Ignite term?
> > > > > > > > > > > > > >>
> > > > > > > > > > > > > >> D.
> > > > > > > > > > > > > >>
> > > > > > > > > > > > > >> On Fri, Feb 16,
2018 at 8:03 AM, Anton
> Vinogradov
> > <
> > > > > > > > > > > > > >> avinogradov@gridgain.com>
> > > > > > > > > > > > > >> wrote:
> > > > > > > > > > > > > >>
> > > > > > > > > > > > > >> Log only mode: flushes
application buffers.
> > > > > > > > > > > > > >>> So, in synced
mode without fsync guarantee.
> > That's
> > > > why
> > > > > I
> > > > > > > > > propose
> > > > > > > > > > to
> > > > > > > > > > > > > >>> rename
> > > > > > > > > > > > > >>> it as SYNC.
> > > > > > > > > > > > > >>>
> > > > > > > > > > > > > >>> On Fri, Feb
16, 2018 at 4:49 PM, Ilya Lantukh <
> > > > > > > > > > > ilantukh@gridgain.com
> > > > > > > > > > > > >
> > > > > > > > > > > > > >>> wrote:
> > > > > > > > > > > > > >>>
> > > > > > > > > > > > > >>> I am OK with
either FSYNC or STRICT variant.
> > > > > > > > > > > > > >>>>
> > > > > > > > > > > > > >>>> LOG_ONLY
name means "log without fsync".
> > > > > > > > > > > > > >>>>
> > > > > > > > > > > > > >>>> On Fri,
Feb 16, 2018 at 4:05 PM, Dmitriy
> > > Setrakyan <
> > > > > > > > > > > > > >>>>
> > > > > > > > > > > > > >>> dsetrakyan@apache.org>
> > > > > > > > > > > > > >>>
> > > > > > > > > > > > > >>>> wrote:
> > > > > > > > > > > > > >>>>
> > > > > > > > > > > > > >>>> On Fri,
Feb 16, 2018 at 7:02 AM, Ivan Rakov <
> > > > > > > > > > > ivan.glukos@gmail.com>
> > > > > > > > > > > > > >>>>>
> > > > > > > > > > > > > >>>> wrote:
> > > > > > > > > > > > > >>>>
> > > > > > > > > > > > > >>>>> Why
create a new term to define something
> that
> > > has
> > > > > > > already
> > > > > > > > > been
> > > > > > > > > > > > > >>>>>>
> > > > > > > > > > > > > >>>>> defined?
> > > > > > > > > > > > > >>>>
> > > > > > > > > > > > > >>>>> That
makes sense. I'm ok with FSYNC.
> > > > > > > > > > > > > >>>>>>
Anton, I don't understand why we should
> rename
> > > > > > LOG_ONLY
> > > > > > > to
> > > > > > > > > > SYNC.
> > > > > > > > > > > > We
> > > > > > > > > > > > > >>>>>>
started this discussion with bad naming of
> > > > DEFAULT,
> > > > > > but
> > > > > > > > this
> > > > > > > > > > has
> > > > > > > > > > > > > >>>>>>
> > > > > > > > > > > > > >>>>> nothing
> > > > > > > > > > > > > >>>>
> > > > > > > > > > > > > >>>>> to
> > > > > > > > > > > > > >>>>>
> > > > > > > > > > > > > >>>>>>
do with LOG_ONLY (even though it may be
> > > > scientific -
> > > > > > but
> > > > > > > > > SYNC
> > > > > > > > > > > > sounds
> > > > > > > > > > > > > >>>>>>
scientific as well).
> > > > > > > > > > > > > >>>>>>
> > > > > > > > > > > > > >>>>>>
I agree with Ivan, we should not go wild
> with
> > > > > > renaming.
> > > > > > > > > > > However, I
> > > > > > > > > > > > > >>>>>
> > > > > > > > > > > > > >>>> would
> > > > > > > > > > > > > >>>
> > > > > > > > > > > > > >>>> like to
find out what is the meaning behind
> the
> > > > > LOG_ONLY
> > > > > > > > name.
> > > > > > > > > > Can
> > > > > > > > > > > > > >>>>>
> > > > > > > > > > > > > >>>> someone
> > > > > > > > > > > > > >>>>
> > > > > > > > > > > > > >>>>> explain?
> > > > > > > > > > > > > >>>>>
> > > > > > > > > > > > > >>>>> D.
> > > > > > > > > > > > > >>>>>
> > > > > > > > > > > > > >>>>>
> > > > > > > > > > > > > >>>>
> > > > > > > > > > > > > >>>> --
> > > > > > > > > > > > > >>>> Best regards,
> > > > > > > > > > > > > >>>> Ilya
> > > > > > > > > > > > > >>>>
> > > > > > > > > > > > > >>>>
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Sergey Kozlov
> > > > > GridGain Systems
> > > > > www.gridgain.com
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message