ignite-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From keinproblem <nol...@qq.com>
Subject Volatile Kubernetes Node Discovery
Date Tue, 02 May 2017 19:20:27 GMT
Dear Apache Ignite Users Community,

This may be a well-known problem, although the currently available
information does not provide enough help for solving this issue.

Inside my service I'm using a IgniteCache in /Replicated/ mode from Ignite
Some replicas of this service run inside Kubernetes in form of Pods (1
I'm using the 
for the Node Discovery.
As I understood: each Pod is able to make an API Call to the Kubernetes API
and retrieve the list of currently available nodes. This works properly.
Even though the Pod's own IP will also be retrieved, which produces a
somehow harmless 

Here is how I get my /IgniteCache/ the used /IgniteConfiguration/

    public IgniteCache<String,MyCacheObject> getCacheInstance(){
        final CacheConfiguration<String,Tenant> cacheConfiguration = new
        return ignite.getOrCreateCache(cacheConfiguration);

    public static IgniteConfiguration getDefaultIgniteConfiguration(){
        final IgniteConfiguration cfg = new IgniteConfiguration();
        cfg.setGridLogger(new Slf4jLogger(log));

        final TcpDiscoveryKubernetesIpFinder kubernetesPodIpFinder = new
        final TcpDiscoverySpi tcpDiscoverySpi = new TcpDiscoverySpi();

        tcpDiscoverySpi.setLocalPort(47500);        //using a static port,
to decrease potential failure causes
        return cfg;

The initial node will start up properly every time.

In most cases, the ~ 3rd node trying to connect will fail and gets restarted
by Kubernetes after some time. Sometimes this node will succeed in
connecting to the cluster after a few restarts, but the common case is that
the nodes will keep restarting forever.

But the major issue is that when a new node fails to connect to the cluster,
the cluster seems to become unstable: the number of nodes increases for a
very short time, then drops to the previous count or even lower.
I am not sure if those are the new connecting nodes loosing the connection
immediately again, or if the previous successfully connected nodes loose

I also deployed the bare Ignite Docker Image including a configuration for
 /TcpDiscoveryKubernetesIpFinder/ as described here 
<https://apacheignite.readme.io/docs/kubernetes-deployment>  . 
Even with this minimal setup, I've experienced the same behavior.

There is no load on the Ignite Nodes and the network usage is very low.

Using another Kubernetes instance on another infrastructure showed the same
results, hence I assume this to be an Ignite related issue.

What I also tried is, increasing the specific time-outs like /ackTimeout/,
/sockTimeout/ etc.

Also using the /TcpDiscoveryVmIpFinder/ did not help. Where I got all the
endpoints via DNS.
Same behavior as described inb4.

Please find attached a log file providing information on WARN level. Please
let me know if DEBUG level is desired.

Kind regards and thanks in advance,

View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Volatile-Kubernetes-Node-Discovery-tp12357.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

View raw message