ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vladimir Ozerov <voze...@gridgain.com>
Subject Re: MVCC and IgniteDataStreamer
Date Tue, 14 Aug 2018 09:30:40 GMT
Bypassing WAL will make the whole cache data vulnerable to complete loss in
case of node failure. I would not do this automatically.

On Mon, Jul 16, 2018 at 12:28 PM Ilya Kasnacheev <ilya.kasnacheev@gmail.com>
wrote:

> Hello!
>
> Can we also bypass WAL for such mode automatically?
>
> However, we will definitely need a 'normal' mode of DataStreamer operation,
> for people who use dataStreamer with custom stream transformers on existing
> data in use.
>
> Regards,
>
> --
> Ilya Kasnacheev
>
> 2018-07-14 12:33 GMT+03:00 Vladimir Ozerov <vozerov@gridgain.com>:
>
> > Igniters,
> >
> > Denis is right - please pay attention to IEP-22, as this is how we are
> > going to load data into the grid in future. Note that current data
> streamer
> > internals are not efficient enough, primarily because it has to interact
> > with page memory, free lists and various BTree's in regular manner. I
> think
> > that when IEP-22 is implemented, it will be integrated with data streamer
> > tightly, and the most defautl way to load data would be:
> > 1) Obtain exclusive table lock
> > 2) Load data bypassing almost all Ignite internals
> > 3) Re-build indexes
> > 4) Release the lock
> >
> > Normally all types of data load should obey transactional semantics if
> MVCC
> > is enabled, and we should think separately on how to do that for
> > continuous-streaming case.
> >
> > For now let's focus on immediate goal for MVCC release - data streamer
> > should work, no new abstractions or APIs should be introduced. The
> easiest
> > way to do this is to agree that streamer is not transactional and use
> > special version as Igor proposed. In future releases, when IEP-22 is
> > implemented, it become transactional with help of exclusive table lock.
> In
> > more distant releases we will think about separate optimizations for
> > continuous streaming and possibly other cases.
> >
> > Makes sense?
> >
> > Vladimir.
> >
> >
> > On Fri, Jul 13, 2018 at 11:30 PM Denis Magda <dmagda@apache.org> wrote:
> >
> > > Agree that initial loading and real-time streaming should be seen as
> > > different use cases.
> > >
> > > For the loading part, I would borrow ideas from direct data load IEP
> [1].
> > > Ignite should assume that no app works with the cluster until it's
> > > preloaded. So, no global locks or things like that. Just fasten a seat
> > belt
> > > and feed data to your nodes.
> > >
> > > For the streaming part, I would consider 2 or 3 proposed by Igor.
> > >
> > > --
> > > Denis
> > >
> > > [1]
> > >
> > > https://cwiki.apache.org/confluence/display/IGNITE/IEP-
> > 22%3A+Direct+Data+Load
> > >
> > > On Fri, Jul 13, 2018 at 10:03 AM Seliverstov Igor <
> gvvinblade@gmail.com>
> > > wrote:
> > >
> > > > Ivan,
> > > >
> > > > Anyway DataStreamer is the fastest way to deliver data to a data
> node,
> > > the
> > > > question is how to apply it correctly.
> > > >
> > > > I don’t thing we need one more tool, which 90% is the same as
> > > DataStreamer.
> > > >
> > > > All we need is just to implement a couple of new stream receivers.
> > > >
> > > > Regards,
> > > > Igor
> > > >
> > > > > 13 июля 2018 г., в 9:56, Павлухин Иван <vololo100@gmail.com>
> > > написал(а):
> > > > >
> > > > > Hi Igniters,
> > > > >
> > > > > I had a look into IgniteDataStreamer. As far as I understand,
> > currently
> > > > it
> > > > > just works incorrectly for MVCC tables. It appears as a blocker for
> > > > > releasing MVCC. The simplest thing is to refuse creating streamer
> for
> > > > MVCC
> > > > > tables.
> > > > >
> > > > > Next step could be hair splitting of related use cases. For me,
> > initial
> > > > > load and continuous streaming look quite different cases and it is
> > > better
> > > > > to keep them separate at least at API level. Perhaps, it is better
> to
> > > > > separate API basing on user experience. For example, DataStreamer
> > could
> > > > be
> > > > > considered tool without surprises (which means leaving data always
> > > > > consistent, transactions). And let's say BulkLoader is a beast for
> > > > fastest
> > > > > data loading but full of surprises. Such surprises could be locking
> > > > tables,
> > > > > rolling back user transactions and so on. So, it is of very limited
> > use
> > > > > (like initial load). Keeping API entities separate looks better for
> > me
> > > > than
> > > > > introducing multiple modes, because separated entities are easier
> for
> > > > > understanding and so less prone to user mistakes.
> > > > >
> > > > > --
> > > > > Best regards,
> > > > > Ivan Pavlukhin
> > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message