geronimo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Trygve Hardersen <try...@jotta.no>
Subject 2.2 in production
Date Fri, 30 Oct 2009 16:39:34 GMT
Hello

We have been using Geronimo 2.2-SNAPSHOT in production for a good month now,
and I thought I'd share some experiences with the community, and maybe get
some help. We are an online backup service, check out jottabackup.com if
you're interested.

Generally our experience has been very positive. We're using the GBean
framework for custom server components, the DB connection pools against
MySQL databases, stateless service EJBs and various MDB, and of course the
web tier (Jetty). Everything is running smoothly and we're very much looking
forward to 2.2 being released so we can "release" our own software.

The issues we're having are related to WADI clustering with Jetty. First we
can't use Jetty7 because of GERONIMO-4846, so we're using Jetty6 which works
fine. The more serious issue is that we often can not update our servers
without downtime. This is what happens:

We have 2 application servers (AS-000 and AS-001) running dynamic WADI HTTP
session replication between them. When updating we first stop one, AS-000 in
this case. That works fine and the active sessions are migrated to AS-001:

23:43:18,160 INFO  [SimpleStateManager]

=============================

New Partition Balancing

Partition Balancing

    Size [24]

    Partition[0] owned by [TribesPeer [AS-001; tcp://10.0.10.101:4000]];
version [3]; mergeVersion [0]

    Partition[1] owned by [TribesPeer [AS-001; tcp://10.0.10.101:4000]];
version [3]; mergeVersion [0]

    Partition[2] owned by [TribesPeer [AS-001; tcp://10.0.10.101:4000]];
version [3]; mergeVersion [0]

    Partition[3] owned by [TribesPeer [AS-001; tcp://10.0.10.101:4000]];
version [3]; mergeVersion [0]

    Partition[4] owned by [TribesPeer [AS-001; tcp://10.0.10.101:4000]];
version [3]; mergeVersion [0]

    Partition[5] owned by [TribesPeer [AS-001; tcp://10.0.10.101:4000]];
version [3]; mergeVersion [0]

    Partition[6] owned by [TribesPeer [AS-001; tcp://10.0.10.101:4000]];
version [3]; mergeVersion [0]

    Partition[7] owned by [TribesPeer [AS-001; tcp://10.0.10.101:4000]];
version [3]; mergeVersion [0]

    Partition[8] owned by [TribesPeer [AS-001; tcp://10.0.10.101:4000]];
version [3]; mergeVersion [0]

    Partition[9] owned by [TribesPeer [AS-001; tcp://10.0.10.101:4000]];
version [3]; mergeVersion [0]

    Partition[10] owned by [TribesPeer [AS-001; tcp://10.0.10.101:4000]];
version [3]; mergeVersion [0]

    Partition[11] owned by [TribesPeer [AS-001; tcp://10.0.10.101:4000]];
version [3]; mergeVersion [0]

    Partition[12] owned by [TribesPeer [AS-001; tcp://10.0.10.101:4000]];
version [3]; mergeVersion [0]

    Partition[13] owned by [TribesPeer [AS-001; tcp://10.0.10.101:4000]];
version [3]; mergeVersion [0]

    Partition[14] owned by [TribesPeer [AS-001; tcp://10.0.10.101:4000]];
version [3]; mergeVersion [0]

    Partition[15] owned by [TribesPeer [AS-001; tcp://10.0.10.101:4000]];
version [3]; mergeVersion [0]

    Partition[16] owned by [TribesPeer [AS-001; tcp://10.0.10.101:4000]];
version [3]; mergeVersion [0]

    Partition[17] owned by [TribesPeer [AS-001; tcp://10.0.10.101:4000]];
version [3]; mergeVersion [0]

    Partition[18] owned by [TribesPeer [AS-001; tcp://10.0.10.101:4000]];
version [3]; mergeVersion [0]

    Partition[19] owned by [TribesPeer [AS-001; tcp://10.0.10.101:4000]];
version [3]; mergeVersion [0]

    Partition[20] owned by [TribesPeer [AS-001; tcp://10.0.10.101:4000]];
version [3]; mergeVersion [0]

    Partition[21] owned by [TribesPeer [AS-001; tcp://10.0.10.101:4000]];
version [3]; mergeVersion [0]

    Partition[22] owned by [TribesPeer [AS-001; tcp://10.0.10.101:4000]];
version [3]; mergeVersion [0]

    Partition[23] owned by [TribesPeer [AS-001; tcp://10.0.10.101:4000]];
version [3]; mergeVersion [0]

=============================


23:43:28,539 INFO  [TcpFailureDetector] Verification complete. Member
disappeared[org.apache.catalina.tribes.membership.MemberImpl[tcp://{10, 0,
10, 100}:4000,{10, 0, 10, 100},4000, alive=41104531,id={-4 -32 54 90 -109
-17 65 64 -117 40 -110 -14 36 93 -12 -118 }, payload={-84 -19 0 5 115 114 0
50 111 ...(423)}, command={66 65 66 89 45 65 76 69 88 ...(9)}, domain={74 79
84 84 65 95 87 65 68 ...(10)}, ]]

23:43:28,540 INFO  [ChannelInterceptorBase] memberDisappeared:tcp://{10, 0,
10, 100}:4000


We then update AS-000 and try to start it, but it fails to rejoin the WADI
cluster:


23:46:30,784 INFO  [ReceiverBase] Receiver Server Socket bound to:/
10.0.10.100:4000

23:46:30,864 INFO  [ChannelInterceptorBase] memberStart
local:org.apache.catalina.tribes.membership.MemberImpl[tcp://
10.0.10.100:4000,10.0.10.100,4000, alive=0,id={-103 34 80 -19 68 -51 70 -91
-108 39 -84 65 50 50 103 -107 }, payload={-84 -19 0 5 115 114 0 50 111
...(423)}, command={}, domain={74 79 84 84 65 95 87 65 68 ...(10)}, ]
notify:false peer:AS-000

23:46:30,868 INFO  [McastService] Setting cluster mcast soTimeout to 500

23:46:30,908 INFO  [McastService] Sleeping for 1000 milliseconds to
establish cluster membership, start level:4

23:46:31,139 INFO  [ChannelInterceptorBase] memberAdded:tcp://{10, 0, 10,
101}:4000

23:46:31,908 INFO  [McastService] Done sleeping, membership established,
start level:4

23:46:31,912 INFO  [McastService] Sleeping for 1000 milliseconds to
establish cluster membership, start level:8

23:46:31,927 INFO  [BufferPool] Created a buffer pool with max
size:104857600 bytes of type:org.apache.catalina.tribes.io.BufferPool15Impl

23:46:32,912 INFO  [McastService] Done sleeping, membership established,
start level:8

23:46:32,912 INFO  [ChannelInterceptorBase] memberStart
local:org.apache.catalina.tribes.membership.MemberImpl[tcp://
10.0.10.100:4000,10.0.10.100,4000, alive=272,id={-103 34 80 -19 68 -51 70
-91 -108 39 -84 65 50 50 103 -107 }, payload={-84 -19 0 5 115 114 0 50 111
...(423)}, command={}, domain={74 79 84 84 65 95 87 65 68 ...(10)}, ]
notify:false peer:AS-000

23:46:37,848 INFO  [DiscStore] Creating directory:
/usr/lib/jotta/jotta-as-prod-1.0-SNAPSHOT/var/temp/SessionStore

23:46:37,930 INFO  [BasicSingletonServiceHolder] [TribesPeer [AS-000; tcp://
10.0.10.100:4000]] owns singleton service [PartitionManager for ServiceSpace
[/]]

23:46:37,964 INFO  [BasicSingletonServiceHolder] [TribesPeer [AS-000; tcp://
10.0.10.100:4000]] resigns ownership of singleton service [PartitionManager
for ServiceSpace [/]]

23:47:40,065 ERROR [BasicServiceRegistry] Error while starting [Holder for
service
[org.codehaus.wadi.location.partitionmanager.SimplePartitionManager@7dc2445f]
named [PartitionManager] in space [ServiceSpace [/]]]

org.codehaus.wadi.location.partitionmanager.PartitionManagerException:
Partition [0] is unknown.

at
org.codehaus.wadi.location.partitionmanager.SimplePartitionManager.waitForBoot(SimplePartitionManager.java:248)

at
org.codehaus.wadi.location.partitionmanager.SimplePartitionManager.start(SimplePartitionManager.java:119)

at
org.codehaus.wadi.servicespace.basic.BasicServiceHolder.start(BasicServiceHolder.java:60)

at
org.codehaus.wadi.servicespace.basic.BasicServiceRegistry.start(BasicServiceRegistry.java:152)

at
org.codehaus.wadi.servicespace.basic.BasicServiceSpace.start(BasicServiceSpace.java:169)

at
org.apache.geronimo.clustering.wadi.BasicWADISessionManager.doStart(BasicWADISessionManager.java:125)

at
org.apache.geronimo.gbean.runtime.GBeanInstance.createInstance(GBeanInstance.java:948)

at
org.apache.geronimo.gbean.runtime.GBeanInstanceState.attemptFullStart(GBeanInstanceState.java:269)

at
org.apache.geronimo.gbean.runtime.GBeanInstanceState.start(GBeanInstanceState.java:103)

at
org.apache.geronimo.gbean.runtime.GBeanInstanceState.startRecursive(GBeanInstanceState.java:125)

at
org.apache.geronimo.gbean.runtime.GBeanInstance.startRecursive(GBeanInstance.java:538)

at
org.apache.geronimo.kernel.basic.BasicKernel.startRecursiveGBean(BasicKernel.java:377)

at
org.apache.geronimo.kernel.config.ConfigurationUtil.startConfigurationGBeans(ConfigurationUtil.java:456)

at
org.apache.geronimo.kernel.config.ConfigurationUtil.startConfigurationGBeans(ConfigurationUtil.java:493)

at
org.apache.geronimo.kernel.config.KernelConfigurationManager.start(KernelConfigurationManager.java:190)

at
org.apache.geronimo.kernel.config.SimpleConfigurationManager.startConfiguration(SimpleConfigurationManager.java:546)

at sun.reflect.GeneratedMethodAccessor22.invoke(Unknown Source)

at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

at java.lang.reflect.Method.invoke(Method.java:597)

at
org.apache.geronimo.gbean.runtime.ReflectionMethodInvoker.invoke(ReflectionMethodInvoker.java:34)

at
org.apache.geronimo.gbean.runtime.GBeanOperation.invoke(GBeanOperation.java:130)

at
org.apache.geronimo.gbean.runtime.GBeanInstance.invoke(GBeanInstance.java:815)

at org.apache.geronimo.gbean.runtime.RawInvoker.invoke(RawInvoker.java:57)

at
org.apache.geronimo.kernel.basic.RawOperationInvoker.invoke(RawOperationInvoker.java:35)

at
org.apache.geronimo.kernel.basic.ProxyMethodInterceptor.intercept(ProxyMethodInterceptor.java:96)

at
org.apache.geronimo.gbean.GBeanLifecycle$$EnhancerByCGLIB$$628b9237.startConfiguration(<generated>)

at
org.apache.geronimo.system.main.EmbeddedDaemon.doStartup(EmbeddedDaemon.java:161)

at
org.apache.geronimo.system.main.EmbeddedDaemon.execute(EmbeddedDaemon.java:78)

at
org.apache.geronimo.kernel.util.MainConfigurationBootstrapper.main(MainConfigurationBootstrapper.java:45)

at org.apache.geronimo.cli.AbstractCLI.executeMain(AbstractCLI.java:65)

at org.apache.geronimo.cli.daemon.DaemonCLI.main(DaemonCLI.java:30)

23:47:40,078 ERROR [BasicWADISessionManager] Failed to stop

org.codehaus.wadi.servicespace.ServiceSpaceNotFoundException:
ServiceSpaceName not found

at
org.codehaus.wadi.servicespace.basic.BasicServiceSpaceRegistry.unregister(BasicServiceSpaceRegistry.java:55)

at
org.codehaus.wadi.servicespace.basic.BasicServiceSpace.unregisterServiceSpace(BasicServiceSpace.java:228)

at
org.codehaus.wadi.servicespace.basic.BasicServiceSpace.stop(BasicServiceSpace.java:175)

at
org.apache.geronimo.clustering.wadi.BasicWADISessionManager.doFail(BasicWADISessionManager.java:134)

at
org.apache.geronimo.gbean.runtime.GBeanInstance.createInstance(GBeanInstance.java:978)

at
org.apache.geronimo.gbean.runtime.GBeanInstanceState.attemptFullStart(GBeanInstanceState.java:269)

at
org.apache.geronimo.gbean.runtime.GBeanInstanceState.start(GBeanInstanceState.java:103)

at
org.apache.geronimo.gbean.runtime.GBeanInstanceState.startRecursive(GBeanInstanceState.java:125)

at
org.apache.geronimo.gbean.runtime.GBeanInstance.startRecursive(GBeanInstance.java:538)

at
org.apache.geronimo.kernel.basic.BasicKernel.startRecursiveGBean(BasicKernel.java:377)

at
org.apache.geronimo.kernel.config.ConfigurationUtil.startConfigurationGBeans(ConfigurationUtil.java:456)

at
org.apache.geronimo.kernel.config.ConfigurationUtil.startConfigurationGBeans(ConfigurationUtil.java:493)

at
org.apache.geronimo.kernel.config.KernelConfigurationManager.start(KernelConfigurationManager.java:190)

at
org.apache.geronimo.kernel.config.SimpleConfigurationManager.startConfiguration(SimpleConfigurationManager.java:546)

at sun.reflect.GeneratedMethodAccessor22.invoke(Unknown Source)

at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

at java.lang.reflect.Method.invoke(Method.java:597)

at
org.apache.geronimo.gbean.runtime.ReflectionMethodInvoker.invoke(ReflectionMethodInvoker.java:34)

at
org.apache.geronimo.gbean.runtime.GBeanOperation.invoke(GBeanOperation.java:130)

at
org.apache.geronimo.gbean.runtime.GBeanInstance.invoke(GBeanInstance.java:815)

at org.apache.geronimo.gbean.runtime.RawInvoker.invoke(RawInvoker.java:57)

at
org.apache.geronimo.kernel.basic.RawOperationInvoker.invoke(RawOperationInvoker.java:35)

at
org.apache.geronimo.kernel.basic.ProxyMethodInterceptor.intercept(ProxyMethodInterceptor.java:96)

at
org.apache.geronimo.gbean.GBeanLifecycle$$EnhancerByCGLIB$$628b9237.startConfiguration(<generated>)

at
org.apache.geronimo.system.main.EmbeddedDaemon.doStartup(EmbeddedDaemon.java:161)

at
org.apache.geronimo.system.main.EmbeddedDaemon.execute(EmbeddedDaemon.java:78)

at
org.apache.geronimo.kernel.util.MainConfigurationBootstrapper.main(MainConfigurationBootstrapper.java:45)

at org.apache.geronimo.cli.AbstractCLI.executeMain(AbstractCLI.java:65)

at org.apache.geronimo.cli.daemon.DaemonCLI.main(DaemonCLI.java:30)


After this failure the server stops. Over at the running instance AS-001
this is logged:


23:46:31,909 INFO  [ChannelInterceptorBase] memberAdded:tcp://{10, 0, 10,
100}:4000

23:46:37,929 INFO  [BasicSingletonServiceHolder] [TribesPeer [AS-001; tcp://
10.0.10.101:4000]] owns singleton service [PartitionManager for ServiceSpace
[/]]

23:46:37,929 INFO  [BasicPartitionBalancerSingletonService] Queueing
partition rebalancing

23:46:38,438 ERROR [BasicEnvelopeDispatcherManager] problem dispatching
message

java.lang.IllegalArgumentException:
org.codehaus.wadi.core.store.BasicStoreMotable is not a Session

at
org.codehaus.wadi.replication.manager.basic.SessionStateHandler.newExtractFullStateExternalizable(SessionStateHandler.java:105)

at
org.codehaus.wadi.replication.manager.basic.SessionStateHandler.extractFullState(SessionStateHandler.java:53)

at
org.codehaus.wadi.replication.manager.basic.CreateStorageCommand.execute(CreateStorageCommand.java:45)

at
org.codehaus.wadi.replication.manager.basic.SyncSecondaryManager.updateSecondaries(SyncSecondaryManager.java:169)

at
org.codehaus.wadi.replication.manager.basic.SyncSecondaryManager.updateSecondaries(SyncSecondaryManager.java:114)

at
org.codehaus.wadi.replication.manager.basic.SyncSecondaryManager.updateSecondaries(SyncSecondaryManager.java:103)

at
org.codehaus.wadi.replication.manager.basic.SyncSecondaryManager.updateSecondariesFollowingJoiningPeer(SyncSecondaryManager.java:75)

at
org.codehaus.wadi.replication.manager.basic.ReOrganizeSecondariesListener.receive(ReOrganizeSecondariesListener.java:53)

at
org.codehaus.wadi.servicespace.basic.BasicServiceMonitor.notifyListeners(BasicServiceMonitor.java:124)

at
org.codehaus.wadi.servicespace.basic.BasicServiceMonitor.processLifecycleEvent(BasicServiceMonitor.java:141)

at
org.codehaus.wadi.servicespace.basic.BasicServiceMonitor$ServiceLifecycleEndpoint.dispatch(BasicServiceMonitor.java:148)

at
org.codehaus.wadi.group.impl.ServiceEndpointWrapper.dispatch(ServiceEndpointWrapper.java:50)

at
org.codehaus.wadi.group.impl.BasicEnvelopeDispatcherManager$DispatchRunner.run(BasicEnvelopeDispatcherManager.java:121)

at
org.codehaus.wadi.servicespace.basic.BasicServiceSpaceDispatcher$ExecuteInThread.execute(BasicServiceSpaceDispatcher.java:102)

at
org.codehaus.wadi.group.impl.BasicEnvelopeDispatcherManager.onEnvelope(BasicEnvelopeDispatcherManager.java:100)

at
org.codehaus.wadi.group.impl.AbstractDispatcher.doOnEnvelope(AbstractDispatcher.java:104)

at
org.codehaus.wadi.group.impl.AbstractDispatcher.onEnvelope(AbstractDispatcher.java:100)

at
org.codehaus.wadi.servicespace.basic.ServiceSpaceEndpoint.dispatch(ServiceSpaceEndpoint.java:49)

at
org.codehaus.wadi.group.impl.ServiceEndpointWrapper.dispatch(ServiceEndpointWrapper.java:50)

at
org.codehaus.wadi.group.impl.BasicEnvelopeDispatcherManager$DispatchRunner.run(BasicEnvelopeDispatcherManager.java:121)

at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)

at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)

at java.lang.Thread.run(Thread.java:619)

23:46:40,063 INFO  [BasicPartitionBalancerSingletonService] Queueing
partition rebalancing

23:46:42,938 WARN  [BasicPartitionBalancerSingletonService] Rebalancing has
failed

org.codehaus.wadi.group.MessageExchangeException: No correlated messages
received within [5000]ms

at
org.codehaus.wadi.group.impl.AbstractDispatcher.attemptMultiRendezVous(AbstractDispatcher.java:174)

at
org.codehaus.wadi.location.balancing.BasicPartitionBalancer.fetchBalancingInfoState(BasicPartitionBalancer.java:85)

at
org.codehaus.wadi.location.balancing.BasicPartitionBalancer.balancePartitions(BasicPartitionBalancer.java:69)

at
org.codehaus.wadi.location.balancing.BasicPartitionBalancerSingletonService.run(BasicPartitionBalancerSingletonService.java:85)

at java.lang.Thread.run(Thread.java:619)

23:46:42,939 WARN  [BasicPartitionBalancerSingletonService] Will retry
rebalancing in [500] ms

23:46:43,439 INFO  [BasicPartitionBalancerSingletonService] Queueing
partition rebalancing

23:47:40,269 INFO  [TcpFailureDetector] Verification complete. Member
disappeared[org.apache.catalina.tribes.membership.MemberImpl[tcp://{10, 0,
10, 100}:4000,{10, 0, 10, 100},4000, alive=69401,id={-103 34 80 -19 68 -51
70 -91 -108 39 -84 65 50 50 103 -107 }, payload={-84 -19 0 5 115 114 0 50
111 ...(423)}, command={66 65 66 89 45 65 76 69 88 ...(9)}, domain={74 79 84
84 65 95 87 65 68 ...(10)}, ]]

23:47:40,271 INFO  [ChannelInterceptorBase] memberDisappeared:tcp://{10, 0,
10, 100}:4000

23:47:40,271 INFO  [BasicPartitionBalancerSingletonService] Queueing
partition rebalancing


If I try to start AS-000 again the same thing happens. If we stop AS-001 the
following is logged;


23:49:18,695 INFO  [SimpleStateManager] Evacuating partitions

23:49:18,699 INFO  [BasicPartitionBalancerSingletonService] Queueing
partition rebalancing

23:49:23,698 WARN  [SimpleStateManager] Partition balancer has disappeared -
backing off for [1000]ms

23:49:24,699 INFO  [BasicPartitionBalancerSingletonService] Queueing
partition rebalancing

23:49:29,698 WARN  [SimpleStateManager] Partition balancer has disappeared -
backing off for [1000]ms

23:49:30,699 INFO  [BasicPartitionBalancerSingletonService] Queueing
partition rebalancing

23:49:35,699 WARN  [SimpleStateManager] Partition balancer has disappeared -
backing off for [1000]ms

23:49:36,700 INFO  [BasicPartitionBalancerSingletonService] Queueing
partition rebalancing

23:49:41,699 WARN  [SimpleStateManager] Partition balancer has disappeared -
backing off for [1000]ms

23:49:42,701 INFO  [BasicPartitionBalancerSingletonService] Queueing
partition rebalancing

23:49:47,700 WARN  [SimpleStateManager] Partition balancer has disappeared -
backing off for [1000]ms

23:49:48,700 INFO  [SimpleStateManager] Evacuated

23:49:48,808 INFO  [AbstractExclusiveContextualiser] Unloaded sessions=[36]

23:49:48,843 INFO  [AbstractExclusiveContextualiser] Unloaded sessions=[13]

23:49:58,852 INFO  [BasicSingletonServiceHolder] [TribesPeer [AS-001; tcp://
10.0.10.101:4000]] resigns ownership of singleton service [PartitionManager
for ServiceSpace [/]]


, however AS-001 then just hangs, and we have to kill the process to get it
stopped. After this we can start AS-000, update AS-001 and it always seems
to have no problem joining the cluster thereafter. The strange thing is that
this problem does not always occur, sometimes everything goes fine. I can't
find a consistent pattern but I've tried stopping AS-001 before AS-000, and
I'm sure no serializable object in the session has changed between the
updated and running instance.


My gut feeling is that this is either a concurrency-related bug in WADI or a
network-timeout related problem. During normal operation I'm sometimes
seeing messages like this in the log files:

17:14:08,869 INFO  [TcpFailureDetector] Received
memberDisappeared[org.apache.catalina.tribes.membership.MemberImpl[tcp://{10,
0, 10, 101}:4000,{10, 0, 10, 101},4000, alive=95659954,id={-52 -76 98 22 10
71 76 -72 -122 -59 -21 -29 44 -86 38 114 }, payload={-84 -19 0 5 115 114 0
50 111 ...(423)}, command={}, domain={74 79 84 84 65 95 87 65 68 ...(10)},
]] message. Will verify.
17:14:08,870 INFO  [TcpFailureDetector] Verification complete. Member still
alive[org.apache.catalina.tribes.membership.MemberImpl[tcp://{10, 0, 10,
101}:4000,{10, 0, 10, 101},4000, alive=95659954,id={-52 -76 98 22 10 71 76
-72 -122 -59 -21 -29 44 -86 38 114 }, payload={-84 -19 0 5 115 114 0 50 111
...(423)}, command={}, domain={74 79 84 84 65 95 87 65 68 ...(10)}, ]]

And lately, as traffic has increased, errors like this:


16:22:43,524 WARN  [UpdateReplicationCommand] Update has not been properly
cascaded due to a communication failure. If a targeted node has been lost,
state will be re-balanced automatically.

org.codehaus.wadi.servicespace.ServiceInvocationException:
org.codehaus.wadi.group.MessageExchangeException: No correlated messages
received within [2000]ms

at
org.codehaus.wadi.servicespace.basic.CGLIBServiceProxyFactory$ProxyMethodInterceptor.intercept(CGLIBServiceProxyFactory.java:209)

at
org.codehaus.wadi.replication.storage.ReplicaStorage$$EnhancerByCGLIB$$a901e91b.mergeUpdate(<generated>)

at
org.codehaus.wadi.replication.manager.basic.UpdateReplicationCommand.cascadeUpdate(UpdateReplicationCommand.java:93)

at
org.codehaus.wadi.replication.manager.basic.UpdateReplicationCommand.run(UpdateReplicationCommand.java:86)

at
org.codehaus.wadi.replication.manager.basic.SyncReplicationManager.update(SyncReplicationManager.java:138)

at
org.codehaus.wadi.replication.manager.basic.LoggingReplicationManager.update(LoggingReplicationManager.java:100)

at
org.codehaus.wadi.core.session.AbstractReplicableSession.onEndProcessing(AbstractReplicableSession.java:49)

at
org.codehaus.wadi.core.session.AtomicallyReplicableSession.onEndProcessing(AtomicallyReplicableSession.java:58)

at
org.apache.geronimo.clustering.wadi.WADISessionAdaptor.onEndAccess(WADISessionAdaptor.java:77)

at
org.apache.geronimo.jetty6.cluster.ClusteredSessionManager.complete(ClusteredSessionManager.java:60)

at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:198)

at
org.apache.geronimo.jetty6.cluster.ClusteredSessionHandler.doHandle(ClusteredSessionHandler.java:59)

at
org.apache.geronimo.jetty6.cluster.ClusteredSessionHandler$ActualHandler.handle(ClusteredSessionHandler.java:66)

at
org.apache.geronimo.jetty6.cluster.AbstractClusteredPreHandler$WebClusteredInvocation.invokeLocally(AbstractClusteredPreHandler.java:71)

at
org.apache.geronimo.jetty6.cluster.wadi.WADIClusteredPreHandler$WADIWebClusteredInvocation.access$000(WADIClusteredPreHandler.java:52)

at
org.apache.geronimo.jetty6.cluster.wadi.WADIClusteredPreHandler$WADIWebClusteredInvocation$1.doFilter(WADIClusteredPreHandler.java:64)

at org.codehaus.wadi.web.impl.WebInvocation.invoke(WebInvocation.java:116)

at
org.codehaus.wadi.core.contextualiser.MemoryContextualiser.handleLocally(MemoryContextualiser.java:71)

at
org.codehaus.wadi.core.contextualiser.AbstractExclusiveContextualiser.handle(AbstractExclusiveContextualiser.java:94)

at
org.codehaus.wadi.core.contextualiser.AbstractMotingContextualiser.contextualise(AbstractMotingContextualiser.java:37)

at
org.codehaus.wadi.core.manager.StandardManager.processStateful(StandardManager.java:150)

at
org.codehaus.wadi.core.manager.StandardManager.contextualise(StandardManager.java:142)

at
org.codehaus.wadi.core.manager.ClusteredManager.contextualise(ClusteredManager.java:81)

at
org.apache.geronimo.jetty6.cluster.wadi.WADIClusteredPreHandler$WADIWebClusteredInvocation.invoke(WADIClusteredPreHandler.java:72)

at
org.apache.geronimo.jetty6.cluster.AbstractClusteredPreHandler.handle(AbstractClusteredPreHandler.java:39)

at
org.apache.geronimo.jetty6.cluster.ClusteredSessionHandler.handle(ClusteredSessionHandler.java:51)

at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:765)

at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:417)

at
org.apache.geronimo.jetty6.handler.TwistyWebAppContext.access$101(TwistyWebAppContext.java:41)

at
org.apache.geronimo.jetty6.handler.TwistyWebAppContext$TwistyHandler.handle(TwistyWebAppContext.java:66)

at
org.apache.geronimo.jetty6.handler.ThreadClassloaderHandler.handle(ThreadClassloaderHandler.java:46)

at
org.apache.geronimo.jetty6.handler.InstanceContextHandler.handle(InstanceContextHandler.java:58)

at
org.apache.geronimo.jetty6.handler.UserTransactionHandler.handle(UserTransactionHandler.java:48)

at
org.apache.geronimo.jetty6.handler.ComponentContextHandler.handle(ComponentContextHandler.java:47)

at
org.apache.geronimo.jetty6.handler.TwistyWebAppContext.handle(TwistyWebAppContext.java:60)

at
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)

at
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)

at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)

at org.mortbay.jetty.Server.handle(Server.java:326)

at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:534)

at
org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:879)

at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:747)

at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218)

at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)

at
org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409)

at org.apache.geronimo.pool.ThreadPool$1.run(ThreadPool.java:214)

at
org.apache.geronimo.pool.ThreadPool$ContextClassLoaderRunnable.run(ThreadPool.java:344)

at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)

at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)

at java.lang.Thread.run(Thread.java:619)

Caused by: org.codehaus.wadi.group.MessageExchangeException: No correlated
messages received within [2000]ms

at
org.codehaus.wadi.group.impl.AbstractDispatcher.attemptMultiRendezVous(AbstractDispatcher.java:174)

at
org.codehaus.wadi.servicespace.basic.BasicServiceInvoker.invokeOnPeers(BasicServiceInvoker.java:90)

at
org.codehaus.wadi.servicespace.basic.BasicServiceInvoker.invoke(BasicServiceInvoker.java:69)

at
org.codehaus.wadi.servicespace.basic.CGLIBServiceProxyFactory$ProxyMethodInterceptor.intercept(CGLIBServiceProxyFactory.java:193)

... 49 more


Does anyone have some insight into what might be causing this, or where the
timeouts can be increased if there are any?

I'm thinking that a static WADI configuration can be more stable than the
dynamic setup we have not which relies upon multicasting. Does anyone have
experience with similar setups?

Thanks!

Trygve Hardersen - Jotta

Mime
View raw message