tomcat-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pid <...@pidster.com>
Subject Re: BackupManager vs DeltaManager
Date Tue, 02 Nov 2010 09:02:24 GMT
On 01/11/2010 14:44, Ossi wrote:
> On Fri, Oct 29, 2010 at 1:51 PM, Pid <pid@pidster.com> wrote:
> 
>> On 29/10/2010 11:17, Ossi wrote:
>>> Hi!
>>>
>>> Should BackupManager work well with any number of nodes?
>>
>> Yes.
>>
>>> And with large clusters it should work even better than DeltaManager?
>>
>> Yes.  *Should*.
>>
>>> We have large production clusters (10+) nodes and we have evaluated if we
>>> can use BackupManager.
>>>
>>> In test cluster of 6 nodes it didn't work too well: much higher request
>>> latency, with logs full of following errors:
>>>
>>> 2010-09-24 14:17:34,536 ERROR [tomcat-processor-53]
>>> (org.apache.catalina.tribes.tipis.AbstractReplicatedMap) Unable to
>> replicate
>>> out data for a LazyReplicatedMap.get
>>> operationorg.apache.catalina.tribes.ChannelException: Operation has timed
>>> out(3000 ms.).; Faulty members:tcp://{10, 1, 8, 219}:4200;
>>
>> It's timing out for some reason.  You could try increasing the timeout.
>>
> 
> 
> Yes, I noticed that. However it is using same configs that with DeltaManager
> and we didn't get
> those same errors with that.

It'll be a bit tedious, but it might be beneficial to look at a tcpdump
trace of the connection traffic to see what's happening.

> What could be reason for those timeouts? How to know what
> operation could be causing the timeout? Like is that on
> initialization/starting phase (so, it couldn't connect
> at all) or I something in replication just taking a lot of time.

BackupManager doesn't replicate to the whole cluster, it replicates each
session to one designated backup node.  It does replicate the map of
where all the sessions are to the whole cluster, however.

Maybe it's the latter which is a problem.

> I'll test this with different timeouts.
>
>> Does this occur on all cluster members, or just a few?
> 
> Sorry, I don't remember it has been awhile when we did those test and
> apparently the logs are gone.
> Gotta check this when I test this next time.

OK.  Let us know.


p

Mime
View raw message