ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexandr Kuramshin <ein.nsk...@gmail.com>
Subject Re: IgniteCache.loadCache improvement proposal
Date Tue, 22 Nov 2016 12:43:18 GMT
Val, Yakov,

Sorry for delay, I need time to think and to do some tests.

Anyway, extending the API and supply default implementation - is good. It
makes frameworks more flexible and usable.

But your proposal of extension will not solve the problem that I have
raise. Please, read the next with special attention.

Current implementation IgniteCache.loadCache causes parallel execution of
IgniteCache.localLoadCache on each node in the cluster. It's bad
implementation, but it's *right semantic*.

You propose to extend IgniteCache.localLoadCache and use it to load data on
all the nodes. It's bad semantic. But it also leads to bad implementation.
Please note why.

When you filter the data with the supplied IgniteBiPredicate, you may
access the data that must be co-located. Hence to load the data to all the
nodes, you need access to all the related data partitioned by the cluster.
This leads to great network overhead and near caches overload.

And that is why am I wondering that IgniteBiPredicate is executed for every
key supplied by Cache.loadCache, but not only for those keys, which will be
stored on this node.

My opinion in conclusion.

localLoadCache should first filter a key by the affinity function and the
current cache topology, *then *invoke the predicate, and then store the
entity in the cache (possibly by invoking the supplied closure). All
associated partitions should be locked for the time of loading.

IgniteCache.loadCache should perform Cache.loadCache on the one (or some
more) nodes, then transfer entities to the remote nodes, *then *invoke the
predicate and closure on the remote nodes.

2016-11-22 2:16 GMT+03:00 Valentin Kulichenko <valentin.kulichenko@gmail.com

> Guys,
> I created a ticket for this:
> https://issues.apache.org/jira/browse/IGNITE-4255
> Feel free to provide comments.
> -Val
> On Sat, Nov 19, 2016 at 6:56 AM, Yakov Zhdanov <yzhdanov@apache.org>
> wrote:
> > >
> > >
> > > Why not store the partition ID in the database and query only local
> > > partitions? Whatever approach we design with a DataStreamer will be
> > slower
> > > than this.
> > >
> >
> > Because this can be some generic DB. Imagine the app migrating to IMDG.
> >
> > I am pretty sure that in many cases approach with data streamer will be
> > faster and in many cases approach with multiple queries will be faster.
> And
> > the choice should depend on many factors. I like Val's suggestions. I
> think
> > he goes in the right direction.
> >
> > --Yakov
> >

Alexandr Kuramshin

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message