ignite-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Denis Mekhanikov <dmekhani...@gmail.com>
Subject Re: Re: What's the best practice to init the cache when in cluster env?
Date Tue, 26 Sep 2017 09:01:46 GMT
Well, if you cannot know when topology is full, then you cannot guarantee
that no rebalancing will happen.

If backups are not configured, then data that moved to other nodes will be
removed from the initial node. Rebalancing happens according to a
configured affinity function
<https://apacheignite.readme.io/docs/affinity-collocation#affinity-function>.
By default it is RendezvousAffinityFunction
<https://ignite.apache.org/releases/latest/javadoc/org/apache/ignite/cache/affinity/rendezvous/RendezvousAffinityFunction.html>
and
it aims to minimize data transmission between nodes.
You can implement your own affinity function that will work better by
utilizing additional knowledge about your topology.

Denis

вт, 26 сент. 2017 г. в 11:46, aaron@tophold.com <aaron@tophold.com>:

> Thanks Denis,
>
> The real tough issue is we not sure when the entire cluster may be ready,
> as we may increase or decrease the nodes at run-time.
>
> Another question is , if I load the data once on first started node,
>  after other nodes bring up, and after re-balance, will the primary nodes
> evict the entries not below to it?
>
> As we have regular aggregated run locally on each nodes, we do not want
> this will be too heavy on the first node.
>
>
> Regards
> Aaron
> ------------------------------
> aaron@tophold.com
>
>
> *From:* Denis Mekhanikov <dmekhanikov@gmail.com>
> *Date:* 2017-09-25 19:46
> *To:* user <user@ignite.apache.org>
> *Subject:* Re: What's the best practice to init the cache when in cluster
> env?
>
> Hi Aaron!
>
> There are two good options for data loading: using DataStreamer or
> IgniteCache.loadCache(...)
> <https://apacheignite.readme.io/docs/3rd-party-store#section-loadcache->.
> The second option is good when initial data is stored in some database.
>
> If you worry about overhead on data rebalancing, you can start the cluster
> and start streaming data once all nodes are up. In this case records will
> appear at their final destination at once, without need to move to other
> nodes.
>
> Denis
>
> пн, 25 сент. 2017 г. в 14:31, aaron@tophold.com <aaron@tophold.com>:
>
>> hi All,
>>
>> If we have dozen of nodes to cache millions data from DB;
>>
>> When init,  what's the best way to loading those data? we use the data
>> streamer to load data, while all our entry include a partition ID when
>> insert into DB.
>>
>> As the nodes are started one by one, if loading from one Node and then
>> re-balance this seems impossible & wasting.
>>
>> Not sure whether there any guideline or best practice/advice for such
>> scenario.
>>
>> Thanks for our time!
>>
>>
>> Regards
>> Aaron
>> ------------------------------
>> aaron@tophold.com
>>
>

Mime
View raw message