zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From VALLASTER Stefan <Stefan.VALLAS...@frequentis.com>
Subject RE: IllegalStateException: instance must be started before calling this method
Date Wed, 11 Apr 2018 06:40:45 GMT
Hi,

getting back to this topic in more detail. The container meant here is a normal Java VM running
a specific middleware which offers a "container" concept. It has nothing to do with docker.

The config is read from disc.

Best regards
Stefan

-----Original Message-----
From: Yuval Dori [mailto:yuval.dori@gigaspaces.com] 
Sent: Dienstag, 10. April 2018 14:34
To: user@zookeeper.apache.org
Cc: LEONHARTSBERGER Stefan
Subject: Re: IllegalStateException: instance must be started before calling this method

Hi,

I have answered the following at March 26:

Hi Martin,

Not in a Docker. We are using the word container for one of our services.

Thanks,

Yuval




> On Apr 10, 2018, at 2:40 PM, Martin Gainty <mgainty@hotmail.com> wrote:
> 
> MG>waiting for Yuval to answer:
> 
> 
> in docker  container?
> if so how do you resolve hosts (.hosts file, kubernetes, swarm)?
> 
> 
> MG>?
> 
> ________________________________
> From: VALLASTER Stefan <Stefan.VALLASTER@frequentis.com>
> Sent: Tuesday, April 10, 2018 4:46 AM
> To: user@zookeeper.apache.org
> Subject: RE: IllegalStateException: instance must be started before calling this method
> 
> How can we get this topic forward again? It seems the mail thread is currently stalled
for no obvious reason. Anyone willing to help us out here again?
> 
> Best regards
> Stefan
> 
> -----Original Message-----
> From: Martin Gainty [mailto:mgainty@hotmail.com]
> Sent: Sonntag, 25. März 2018 22:58
> To: user@zookeeper.apache.org
> Subject: Re: IllegalStateException: instance must be started before calling this method
> 
> 
> 
> in docker  container?
> 
> if so how do you resolve hosts (.hosts file, kubernetes, swarm)?
> 
> M
> ________________________________
> From: Yuval Dori <yuval.dori@gigaspaces.com>
> Sent: Sunday, March 25, 2018 5:34 AM
> To: user@zookeeper.apache.org
> Cc: Yuval Dori
> Subject: IllegalStateException: instance must be started before calling this method
> 
> Hi,
> 
> Our customer is using ZK 3.4.3 and got IllegalStateException: instance must be started
before calling this method
> 
> 
> with the following stack trace:
> 2018-02-15T09:58:12.094+0100; ERROR; WSOSTSLXWIT01/MANAGER; P3424/T194; [SPACE/LearnerHandler-/10.17.46.142:49336/LearnerHandler<http://10.17.46.142:49336/LearnerHandler>];
Unexpected exception causing shutdown while sock still open
> java.net<http://java.net>.SocketTimeoutException: Read timed out
> at java.net.SocketInputStream.socketRead0(Native Method)
> at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
> at java.net.SocketInputStream.read(SocketInputStream.java:171)
> at java.net.SocketInputStream.read(SocketInputStream.java:141)
> at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
> at java.io.BufferedInputStream.read(BufferedInputStream.java:265)
> at java.io.DataInputStream.readInt(DataInputStream.java:387)
> at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
> at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83)
> at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:99)
> at org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:542)
> 
> 
> This is the description:
> 
> At first ps-main.1 is started as primary on WSOSTSLXBMS01 GSC 3.
> 
> 24.01.2018 11:31:04,947 +01:00 INFO WSOSTSLXBMS01/GSC/3 P1280/T120 SPACE/Curator-LeaderSelector-0/1
Space instance is Primary
> 
> After some relocation of ps-main instances a new backup instance is started on the same
GSC.
> 
> 24.01.2018 11:32:59,886 +01:00 INFO WSOSTSLXBMS01/GSC/3 P1280/T146 SPACE/GS-LRMI Connection-pool-1-thread-6/1
Space instance [ frqMain_container1:frqMain ] has been selected as Backup
> 
> Later a failover test is performed by shutting down WSOSTSLXBMS02 and WSOSTSLXMSS02.
> We would expect that ps-main.1 backup will be elected as primary because the former primary
was located at WSOSTSLXBMS02 but no election happend. In the end in the whole cluster there
was no primary of ps-main.1 running.
> 
> We found suspicious exception in the logs:
> 24.01.2018 11:32:44,987 +01:00 INFO WSOSTSLXBMS01/GSC/3 P1280/T120 SPACE/Curator-LeaderSelector-0/frqMain_container1:frqMain
Lost leadership [sessionId=0x36128253b820001]
> 24.01.2018 11:32:44,987 +01:00 INFO WSOSTSLXBMS01/GSC/3 P1280/T98 SPACE/Curator-Framework-0/CuratorFrameworkImpl
backgroundOperationsLoop exiting
> 24.01.2018 11:32:44,990 +01:00 INFO WSOSTSLXBMS01/GSC/3 P1280/T120 SPACE/Curator-LeaderSelector-0/1
Quiesce state set to SUSPENDED
> 24.01.2018 11:32:44,991 +01:00 INFO WSOSTSLXBMS01/GSC/3 P1280/T24 SPACE/GS-LRMI Connection-pool-1-thread-1/ZooKeeper
Session: 0x36128253b820001 closed
> 24.01.2018 11:32:44,991 +01:00 ERROR WSOSTSLXBMS01/GSC/3 P1280/T120 SPACE/Curator-LeaderSelector-0/LeaderSelector
The leader threw an exception
> 24.01.2018 11:32:44,991 +01:00 ERROR WSOSTSLXBMS01/GSC/3 P1280/T120 SPACE/Curator-LeaderSelector-0/LeaderSelector
java.lang.IllegalStateException: instance must be started before calling this method
> 24.01.2018 11:32:44,991 +01:00 ERROR WSOSTSLXBMS01/GSC/3 P1280/T120 SPACE/Curator-LeaderSelector-0/LeaderSelector
at org.apache.curator.shaded.com<http://org.apache.curator.shaded.com>.google.common.base.Preconditions.checkState(Preconditions.java:176)
> 24.01.2018 11:32:44,991 +01:00 ERROR WSOSTSLXBMS01/GSC/3 P1280/T120 SPACE/Curator-LeaderSelector-0/LeaderSelector
at org.apache.curator.framework.imps.CuratorFrameworkImpl.delete(CuratorFrameworkImpl.java:359)
> 24.01.2018 11:32:44,991 +01:00 ERROR WSOSTSLXBMS01/GSC/3 P1280/T120 SPACE/Curator-LeaderSelector-0/LeaderSelector
at org.apache.curator.framework.recipes.locks.LockInternals.deleteOurPath(LockInternals.java:339)
> 24.01.2018 11:32:44,991 +01:00 ERROR WSOSTSLXBMS01/GSC/3 P1280/T120 SPACE/Curator-LeaderSelector-0/LeaderSelector
at org.apache.curator.framework.recipes.locks.LockInternals.releaseLock(LockInternals.java:123)
> 24.01.2018 11:32:44,991 +01:00 ERROR WSOSTSLXBMS01/GSC/3 P1280/T120 SPACE/Curator-LeaderSelector-0/LeaderSelector
at org.apache.curator.framework.recipes.locks.InterProcessMutex.release(InterProcessMutex.java:154)
> 24.01.2018 11:32:44,991 +01:00 ERROR WSOSTSLXBMS01/GSC/3 P1280/T120 SPACE/Curator-LeaderSelector-0/LeaderSelector
at org.apache.curator.framework.recipes.leader.LeaderSelector.doWork(LeaderSelector.java:449)
> 24.01.2018 11:32:44,991 +01:00 ERROR WSOSTSLXBMS01/GSC/3 P1280/T120 SPACE/Curator-LeaderSelector-0/LeaderSelector
at org.apache.curator.framework.recipes.leader.LeaderSelector.doWorkLoop(LeaderSelector.java:466)
> 24.01.2018 11:32:44,991 +01:00 ERROR WSOSTSLXBMS01/GSC/3 P1280/T120 SPACE/Curator-LeaderSelector-0/LeaderSelector
at org.apache.curator.framework.recipes.leader.LeaderSelector.access$100(LeaderSelector.java:65)
> 24.01.2018 11:32:44,991 +01:00 ERROR WSOSTSLXBMS01/GSC/3 P1280/T120 SPACE/Curator-LeaderSelector-0/LeaderSelector
at org.apache.curator.framework.recipes.leader.LeaderSelector$2.call(LeaderSelector.java:246)
> 24.01.2018 11:32:44,991 +01:00 ERROR WSOSTSLXBMS01/GSC/3 P1280/T120 SPACE/Curator-LeaderSelector-0/LeaderSelector
at org.apache.curator.framework.recipes.leader.LeaderSelector$2.call(LeaderSelector.java:240)
> 24.01.2018 11:32:44,991 +01:00 ERROR WSOSTSLXBMS01/GSC/3 P1280/T120 SPACE/Curator-LeaderSelector-0/LeaderSelector
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> 24.01.2018 11:32:44,991 +01:00 ERROR WSOSTSLXBMS01/GSC/3 P1280/T120 SPACE/Curator-LeaderSelector-0/LeaderSelector
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> 24.01.2018 11:32:44,991 +01:00 ERROR WSOSTSLXBMS01/GSC/3 P1280/T120 SPACE/Curator-LeaderSelector-0/LeaderSelector
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> 24.01.2018 11:32:44,991 +01:00 ERROR WSOSTSLXBMS01/GSC/3 P1280/T120 SPACE/Curator-LeaderSelector-0/LeaderSelector
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> 24.01.2018 11:32:44,991 +01:00 ERROR WSOSTSLXBMS01/GSC/3 P1280/T120 SPACE/Curator-LeaderSelector-0/LeaderSelector
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> 24.01.2018 11:32:44,991 +01:00 ERROR WSOSTSLXBMS01/GSC/3 P1280/T120 SPACE/Curator-LeaderSelector-0/LeaderSelector
at java.lang.Thread.run(Thread.java:748)
> 24.01.2018 11:32:44,991 +01:00 ERROR WSOSTSLXBMS01/GSC/3 P1280/T120 SPACE/Curator-LeaderSelector-0/LeaderSelector
> 24.01.2018 11:32:44,991 +01:00 INFO WSOSTSLXBMS01/GSC/3 P1280/T97 SPACE/GS-LRMI Connection-pool-1-thread-1-EventThread/ClientCnxn
EventThread shut down for session: 0x36128253b820001
> 24.01.2018 11:32:44,997 +01:00 INFO WSOSTSLXBMS01/GSC/3 P1280/T24 SPACE/GS-LRMI Connection-pool-1-thread-1/frqMain1_1
Channel is closed [target=frqMain1_1, target url=jini://*/frqMain_container1_1/frqMain?groups=POLBAY&ignoreValidation=true&schema=persistent&id=1&total_members=4,1&versioned=true&mirror=true&cluster_schema=partitioned&locators=WSOSTSLXMSS01,WSOSTSLXMSS02&state=started&timeout=5000,
target machine connection url=NIO://WSOSTSLXBMS02:11002/pid[3756]/24795468496_3_2538032931939568226_details[class
com.gigaspaces.internal.cluster.node.impl.router.AbstractConnectionProxyBasedReplicationRouter$ConnectionEndpoint(frqMain_container1_1:frqMain)]]
> 24.01.2018 11:32:44,998 +01:00 INFO WSOSTSLXBMS01/GSC/3 P1280/T24 SPACE/GS-LRMI Connection-pool-1-thread-1/ps-main-mirror-service
Channel is closed [target=ps-main-mirror-service, target url=jini://*/ps-main-mirror-service_container/ps-main-mirror-service?schema=persistent&id=1&total_members=4,1&versioned=true&mirror=true&groups=POLBAY&cluster_schema=partitioned&locators=WSOSTSLXMSS01,WSOSTSLXMSS02&state=started&timeout=5000,
target machine connection url=NIO://WSOSTSLXBMS01:11002/pid[3220]/25906827215_3_7196393537085178059_details[class
com.gigaspaces.internal.cluster.node.impl.router.AbstractConnectionProxyBasedReplicationRouter$ConnectionEndpoint(ps-main-mirror-service_container:ps-main-mirror-service)]]
> 24.01.2018 11:32:45,767 +01:00 INFO WSOSTSLXBMS01/GSC/3 P1280/T24 SPACE/GS-LRMI Connection-pool-1-thread-1/1
Shutdown complete
> 24.01.2018 11:32:45,768 +01:00 INFO WSOSTSLXBMS01/GSC/3 P1280/T24 SPACE/GS-LRMI Connection-pool-1-thread-1/container
Container <frqMain_container1> shutdown completed
> 
> 24.01.2018 11:32:46,037 +01:00 WARN WSOSTSLXBMS01/GSC/3 P1280/T24 SPACE/GS-LRMI Connection-pool-1-thread-1/PUServiceBeanImpl
Failed to delete deployed processing unit from [C:\Program Files\Frequentis\Frequentis LifeX
Platform\gigaspaces\work\processing-units\ps-main_1_2008048111]
> 24.01.2018 11:32:46,037 +01:00 INFO WSOSTSLXBMS01/GSC/3 P1280/T24 SPACE/GS-LRMI Connection-pool-1-thread-1/PUServiceBeanImpl
Stopped
> 
> Attached thread dump. Failover happend around 24.01.2018 11:40:20,893 +01:00
> WARN WSOSTSLXBMS01/GSC/3 P1280/T59 SPACE/GS-Watchdog/watchdog The ServerEndPoint [WSOSTSLXMSS02/10.17.46.142:11001<http://10.17.46.142:11001/>]
is not reachable or is reachable but with no matching invocation in progress at the server
peer, closing invalid connection with local address [/10.17.46.136:49864<http://10.17.46.136:49864/>]
[ remote invocation of: net.jini.lookup.DiscoveryAdmin.getMemberGroups] [java.net<http://java.net>.SocketTimeoutException]
> 
> 
> Could it be ZK bug?
> 
> Thanks,
> 
> Yuval
> 
> 


Mime
View raw message