ignite-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Denis Magda <dma...@apache.org>
Subject Re: Volatile Kubernetes Node Discovery
Date Wed, 03 May 2017 00:22:21 GMT
> Inside my service I'm using a IgniteCache in /Replicated/ mode from Ignite
> 1.9.
> Some replicas of this service run inside Kubernetes in form of Pods (1
> Container/Pod).
> I'm using the 
> /org.apache.ignite.spi.discovery.tcp.ipfinder.kubernetes.TcpDiscoveryKubernetesIpFinder/
> for the Node Discovery.

Do you mean that a part of the cluster is running outside of Kubernetes? If it’s so this
might be an issue because containerized Ignite nodes can’t get trough the network and reach
out your nodes that are outside.

—
Denis

> On May 2, 2017, at 12:20 PM, keinproblem <noli.m@qq.com> wrote:
> 
> Dear Apache Ignite Users Community,
> 
> This may be a well-known problem, although the currently available
> information does not provide enough help for solving this issue.
> 
> Inside my service I'm using a IgniteCache in /Replicated/ mode from Ignite
> 1.9.
> Some replicas of this service run inside Kubernetes in form of Pods (1
> Container/Pod).
> I'm using the 
> /org.apache.ignite.spi.discovery.tcp.ipfinder.kubernetes.TcpDiscoveryKubernetesIpFinder/
> for the Node Discovery.
> As I understood: each Pod is able to make an API Call to the Kubernetes API
> and retrieve the list of currently available nodes. This works properly.
> Even though the Pod's own IP will also be retrieved, which produces a
> somehow harmless 
> 
> Here is how I get my /IgniteCache/ the used /IgniteConfiguration/
> information:
> 
>    public IgniteCache<String,MyCacheObject> getCacheInstance(){
>        final CacheConfiguration<String,Tenant> cacheConfiguration = new
> CacheConfiguration<>();
>        cacheConfiguration.setName("MyObjectCache");
>        return ignite.getOrCreateCache(cacheConfiguration);
>    }
> 
>    public static IgniteConfiguration getDefaultIgniteConfiguration(){
>        final IgniteConfiguration cfg = new IgniteConfiguration();
>        cfg.setGridLogger(new Slf4jLogger(log));
>        cfg.setClientMode(false);
> 
>        final TcpDiscoveryKubernetesIpFinder kubernetesPodIpFinder = new
> TcpDiscoveryKubernetesIpFinder();
> 
> kubernetesPodIpFinder.setServiceName(SystemDataProvider.getServiceNameEnv);
>        final TcpDiscoverySpi tcpDiscoverySpi = new TcpDiscoverySpi();
> 
> 
>        tcpDiscoverySpi.setIpFinder(kubernetesPodIpFinder);
>        tcpDiscoverySpi.setLocalPort(47500);        //using a static port,
> to decrease potential failure causes
>        cfg.setFailureDetectionTimeout(90000);
>        cfg.setDiscoverySpi(tcpDiscoverySpi);
>        return cfg;
>    }
> 
> 
> 
> The initial node will start up properly every time.
> 
> In most cases, the ~ 3rd node trying to connect will fail and gets restarted
> by Kubernetes after some time. Sometimes this node will succeed in
> connecting to the cluster after a few restarts, but the common case is that
> the nodes will keep restarting forever.
> 
> But the major issue is that when a new node fails to connect to the cluster,
> the cluster seems to become unstable: the number of nodes increases for a
> very short time, then drops to the previous count or even lower.
> I am not sure if those are the new connecting nodes loosing the connection
> immediately again, or if the previous successfully connected nodes loose
> connection.
> 
> 
> I also deployed the bare Ignite Docker Image including a configuration for
> the 
> /TcpDiscoveryKubernetesIpFinder/ as described here 
> https://apacheignite.readme.io/docs/kubernetes-deployment
> <https://apacheignite.readme.io/docs/kubernetes-deployment>  . 
> Even with this minimal setup, I've experienced the same behavior.
> 
> There is no load on the Ignite Nodes and the network usage is very low.
> 
> Using another Kubernetes instance on another infrastructure showed the same
> results, hence I assume this to be an Ignite related issue.
> 
> What I also tried is, increasing the specific time-outs like /ackTimeout/,
> /sockTimeout/ etc.
> 
> Also using the /TcpDiscoveryVmIpFinder/ did not help. Where I got all the
> endpoints via DNS.
> Same behavior as described inb4.
> 
> Please find attached a log file providing information on WARN level. Please
> let me know if DEBUG level is desired.
> 
> 
> 
> Kind regards and thanks in advance,
> keinproblem
> 
> 
> 
> --
> View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Volatile-Kubernetes-Node-Discovery-tp12357.html
> Sent from the Apache Ignite Users mailing list archive at Nabble.com.


Mime
View raw message