ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Raul Kripalani <ra...@apache.org>
Subject Re: Data Snapshots in Ignite
Date Thu, 22 Oct 2015 12:38:29 GMT
Hey Andre,

I think I answered some of your questions in my response to Dmitriy [1].
Could you please have a look and tell me if it answers your questions?

N.B.: My idea is based around the typical use case for LevelDb Snapshots,
but we might create something entirely different in Ignite if the community
wants to.

[1]
http://apache-ignite-developers.2346864.n4.nabble.com/Data-Snapshots-in-Ignite-tp4183p4220.html

*Raúl Kripalani*
PMC & Committer @ Apache Ignite, Apache Camel | Integration, Big Data and
Messaging Engineer
http://about.me/raulkripalani | http://www.linkedin.com/in/raulkripalani
http://blog.raulkr.net | twitter: @raulvk

On Thu, Oct 22, 2015 at 12:49 PM, Andrey Kornev <andrewkornev@hotmail.com>
wrote:

> Hello,
>
> Just a few questions.
>
> 1) It's not clear from the proposed API how to capture/retrieve a
> consistent snapshot of multiple caches. If my query involves a join I'd
> like to ensure consistency across all join participants.
> 2) Implementation wise, is the snapshot just a physical copy of all cache
> entries and their indexes? Or some other mechanism is being considered?
> 3) Isolation: is the snapshot isolated with respect to concurrent
> modifications?
> 4) Serialization: what are my options to ensure that I can still read the
> data from the old snapshots as my key/value class definitions change over
> time?
>
>  I feel I do not quite understand the specific use case this feature is
> expected to be applicable to. Why keeping a snapshot for 2 weeks is
> unimaginable, but 1 or 2 hours is ok?
>
> Also, I think forcing people to set a TTL on a snapshot is pointless and
> will be abused by setting it to an unreasonably large value, just in case.
>
> Thanks
> Andrey
>
> > From: raulk@apache.org
> > Date: Wed, 21 Oct 2015 10:06:25 +0100
> > Subject: Data Snapshots in Ignite
> > To: dev@ignite.apache.org
> >
> > Hey guys,
> >
> > LevelDb has a functionality called Snapshots which provides a consistent
> > read-only view of the DB at a given point in time, against which queries
> > can be executed.
> >
> > To my knowledge, this functionality doesn't exist in the world of open
> > source In-Memory Computing. Ignite could be an innovator here.
> >
> > Ignite Snapshots would allow queries, distributed closures, map-reduce
> > jobs, etc. It could be useful for Spark RDDs to avoid data shift while
> the
> > computation is taking place (not sure if there's already some form of
> > snapshotting, though). Same for IGFS.
> >
> > Example usage:
> >
> >     IgniteCacheSnapshot snapshot =
> > ignite.cache("mycache").snapshots().create();
> >
> >     // all three queries are executed against a view of the cache at the
> > point in time where it was snapshotted
> >     snapshot.query("select ...");
> >     snapshot.query("select ...");
> >     snapshot.query("select ...");
> >
> > In fact, it would be awesome to be able to logically save this snapshot
> > with a name so that later jobs, queries, etc. can run on top of it, e.g.:
> >
> >     IgniteCacheSnapshot snapshot =
> > ignite.cache("mycache").snapshots().create("abc");
> >
> >     // ...
> >     // in another module of a distributed system, or in another thread in
> > parallel, use the saved snapshot
> >     IgniteCacheSnapshot snapshot =
> > ignite.cache("mycache").snapshots().get("abc");
> >     ....
> >
> > Named snapshotting can be dangerous due to data retention, e.g. imagine
> > keeping a snapshot for 2 weeks! So we should force the user to specify a
> > TTL:
> >
> >     IgniteCacheSnapshot snapshot =
> > ignite.cache("mycache").snapshots().create("abc", 2, TimeUnit.HOURS);
> >
> > Such functionality would allow for "reporting checkpoints" and "time
> > travel", for example, where you want users to be able to query the data
> as
> > it stood 1 hour ago, 2 hours ago, etc.
> >
> > What do you think?
> >
> > P.S.: We do have some form of snapshotting in the Compute checkpointing
> > functionality – but my proposal is to generalise the notion.
> >
> > Regards,
> >
> > *Raúl Kripalani*
> > PMC & Committer @ Apache Ignite, Apache Camel | Integration, Big Data and
> > Messaging Engineer
> > http://about.me/raulkripalani | http://www.linkedin.com/in/raulkripalani
> > http://blog.raulkr.net | twitter: @raulvk
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message