ignite-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Denis Magda <dma...@apache.org>
Subject Re: Ignite 2.3 - replicated cache lost data after restart cluster nodes with persistence enabled
Date Thu, 26 Oct 2017 22:48:33 GMT
Dmitriy,

I don’t see why a result of a simple query such as “select count(*) from t;” should
be different if a rebalancing is in progress or after a cluster restart. Ignite’s SQL engine
claims that its fault-tolerant and returns a consistent result set all the times unless a
partition loss happened. Here is we don’t have a partition loss, thus, seems we caught a
bug.

Vladimir O., please chime in.

—
Denis

> On Oct 26, 2017, at 3:34 PM, Dmitry Pavlov <dpavlov.spb@gmail.com> wrote:
> 
> Hi Denis 
> 
> It seems to me that this is not a bug for my scenario, because the data was not loaded
within the same transaction using transactional cache. In this case it is ok that cache data
is rebalanced according to partition update counters,isn't it?
> 
> I suppose in this case the data was not lost ,it was just not completely transferred
to the second node.
> 
> Sincerely, 
> 
> чт, 26 окт. 2017 г., 21:09 Denis Magda <dmagda@apache.org <mailto:dmagda@apache.org>>:
> + dev list
> 
> This scenario has to be handled automatically by Ignite. Seems like a bug. Please refer
to the initial description of the issue. Alex G, please have a look:
> 
> To reproduce:
> 1. create a replicated cache with multiple indexedtypes, with some indexes
> 2. Start first server node
> 3. Insert data into cache (1000000 entries)
> 4. Start second server node
> 
> At this point, seems all is ok, data is apparently successfully rebalanced
> making sql queries (count(*))
> 
> 5. Stop server nodes
> 6. Restart server nodes
> 7. Doing sql queries (count(*)) returns less data
> 
> —
> Denis
> 
> > On Oct 23, 2017, at 5:11 AM, Dmitry Pavlov <dpavlov.spb@gmail.com <mailto:dpavlov.spb@gmail.com>>
wrote:
> >
> > Hi,
> >
> > I tried to write the same code that will execute the described scenario. The results
are as follows:
> > If I do not give enough time to completely rebalance partitions, then the newly
launched node will not have enough data to count(*).
> > If I do not wait for enough time to allow to distribute the data on the grid, the
query will return a smaller number - the number of records that have been uploaded to the
node. I guess there is GridDhtPartitionDemandMessage’s can be found in Ignite debug log
in this moment.
> >
> > If I wait for a sufficient amount of time or directly call the wait on the newly
joined node
> > ignite2.cache (CACHE) .rebalance (). get ();
> > then all results will be correct.
> >
> > About your question>  what's happen if one cluster node crashes in the middle
of rebalance process?
> > In this case normal failover scenario is started, data is rebalanced within cluster.
And if there is enought WAL records on nodes representing history from crash point, then only
recent changes (delta) will be send over network. If there is no enought history to apply
rebalance with most recent changes, then partition will be rebalanced from scratch to new
node.
> >
> > Sincerely,
> > Pavlov Dmitry
> >
> >
> > сб, 21 окт. 2017 г. в 2:07, Manu <maxnu00@hotmail.com <mailto:maxnu00@hotmail.com>
<mailto:maxnu00@hotmail.com <mailto:maxnu00@hotmail.com>>>:
> > Hi,
> >
> > after restart data seems not be consistent.
> >
> > We have been waiting until rebalance was fully completed to restart the
> > cluster to check if durable memory data rebalance works correctly and sql
> > queries still work.
> > Another question (it´s not this case), what's happen if one cluster node
> > crashes in the middle of rebalance process?
> >
> > Thanks!
> >
> >
> >
> > --
> > Sent from: http://apache-ignite-users.70518.x6.nabble.com/ <http://apache-ignite-users.70518.x6.nabble.com/>
<http://apache-ignite-users.70518.x6.nabble.com/ <http://apache-ignite-users.70518.x6.nabble.com/>>


Mime
View raw message