ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Raul Kripalani <ra...@apache.org>
Subject Re: Data Snapshots in Ignite
Date Mon, 26 Oct 2015 12:31:46 GMT
Hi,

Thanks all for chiming in. It seems like this feature could be of interest
to the user community, so I've opened a ticket to continue maturing the
idea there:

https://issues.apache.org/jira/browse/IGNITE-1789

We may need to create a Wiki page later to collaborate around specifics and
design.

Regards,

*Raúl Kripalani*
PMC & Committer @ Apache Ignite, Apache Camel | Integration, Big Data and
Messaging Engineer
http://about.me/raulkripalani | http://www.linkedin.com/in/raulkripalani
http://blog.raulkr.net | twitter: @raulvk

On Wed, Oct 21, 2015 at 10:06 AM, Raul Kripalani <raulk@apache.org> wrote:

> Hey guys,
>
> LevelDb has a functionality called Snapshots which provides a consistent
> read-only view of the DB at a given point in time, against which queries
> can be executed.
>
> To my knowledge, this functionality doesn't exist in the world of open
> source In-Memory Computing. Ignite could be an innovator here.
>
> Ignite Snapshots would allow queries, distributed closures, map-reduce
> jobs, etc. It could be useful for Spark RDDs to avoid data shift while the
> computation is taking place (not sure if there's already some form of
> snapshotting, though). Same for IGFS.
>
> Example usage:
>
>     IgniteCacheSnapshot snapshot =
> ignite.cache("mycache").snapshots().create();
>
>     // all three queries are executed against a view of the cache at the
> point in time where it was snapshotted
>     snapshot.query("select ...");
>     snapshot.query("select ...");
>     snapshot.query("select ...");
>
> In fact, it would be awesome to be able to logically save this snapshot
> with a name so that later jobs, queries, etc. can run on top of it, e.g.:
>
>     IgniteCacheSnapshot snapshot =
> ignite.cache("mycache").snapshots().create("abc");
>
>     // ...
>     // in another module of a distributed system, or in another thread in
> parallel, use the saved snapshot
>     IgniteCacheSnapshot snapshot =
> ignite.cache("mycache").snapshots().get("abc");
>     ....
>
> Named snapshotting can be dangerous due to data retention, e.g. imagine
> keeping a snapshot for 2 weeks! So we should force the user to specify a
> TTL:
>
>     IgniteCacheSnapshot snapshot =
> ignite.cache("mycache").snapshots().create("abc", 2, TimeUnit.HOURS);
>
> Such functionality would allow for "reporting checkpoints" and "time
> travel", for example, where you want users to be able to query the data as
> it stood 1 hour ago, 2 hours ago, etc.
>
> What do you think?
>
> P.S.: We do have some form of snapshotting in the Compute checkpointing
> functionality – but my proposal is to generalise the notion.
>
> Regards,
>
> *Raúl Kripalani*
> PMC & Committer @ Apache Ignite, Apache Camel | Integration, Big Data and
> Messaging Engineer
> http://about.me/raulkripalani | http://www.linkedin.com/in/raulkripalani
> http://blog.raulkr.net | twitter: @raulvk
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message