jakarta-jcs-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron Smuts <asm...@yahoo.com>
Subject Re: Timeout for determining a remote cache is down, and to reconnect
Date Tue, 27 Jun 2006 18:09:54 GMT
I run a cluster of two remote servers.  I put around
20 million items into the remote server across a few
regions.  I am using the JDBC disk cache backed by
MySQL.  I use different tables for different regions. 
I can retrieve hundreds of items from the cache per
second. 

Adding a node to the cluster imposes a performance
penalty for puts.  But it gives you redundancy and the
ability to spread the load for gets.  

If you have an extremely high hit rate, and a low put
rate, then it might make sense to add more nodes to
the remote server cluster.  But you'd only see a
benefit if you configured some of your servers to use
different remote caches as the primary.  That is you
could have a list of 4 remote servers.  This list
would be ordered differently in your various clients. 


You will probably need only two, depending on your
load.

Also, I'm working on s simple way for you to be able
to partition data across different remote nodes.  For
isntance, if you had numeric keys, and if you had two
clusters, you could put odd keys in one and even in
the other.   . . . Basically, this makes the remote
cache infinitely scalable.   . . .

Aaron


--- Paul.Lewandowski@kohls.com wrote:

> Thanks.  I will be implementing a failover cluster
> shortly and realize that
> this will minimize the impact.  In production, I
> will have a total of 4
> remote caching servers with 6 application servers
> pointing to one as
> primary.  Do you have a recommendation on if I
> should just configure one
> failover or use all 4?
> 
> Thanks.
> 
> Paul
> 
> 
> 
>                                                     
>                       
>              Aaron Smuts                            
>                       
>              <asmuts@yahoo.com                      
>                       
>              >                                      
>                    To 
>                                        JCS Users
> List                      
>              06/27/2006 12:25         
> <jcs-users@jakarta.apache.org>      
>              PM                                     
>                    cc 
>                                                     
>                       
>                                                     
>               Subject 
>              Please respond to         Re: Timeout
> for determining a       
>              "JCS Users List"          remote cache
> is down, and to        
>              <jcs-users@jakart         reconnect    
>                       
>                a.apache.org>                        
>                       
>                                                     
>                       
>                                                     
>                       
>                                                     
>                       
>                                                     
>                       
>                                                     
>                       
> 
> 
> 
> 
> You can set the timeout for the client, using this
> parameter:
> 
>
jcs.auxiliary.RC.attributes.RmiSocketFactoryTimeoutMillis=30000
> 
> I need to update the docs.  If the value is -1, the
> cache will not try to set it.  So, if you specify
> -1,
> the default RMI timeout of 60 seconds? will be used.
> If you don't specify anything, then the cache uses a
> default of 10 seconds.
> 
>     /** The default timeout for the custom RMI
> socket
> facfory */
>     public static final int
> DEFAULT_RMI_SOCKET_FACTORY_TIMEOUT_MILLIS = 10000;
> 
> There is no way to configure the reconnect interval.
> I will make it configurable.  You should also run a
> failover remote cache server in a cluster.
> 
> 
> 
> --- Paul.Lewandowski@kohls.com wrote:
> 
> > Some additional information:
> >
> > I looked at my std-err logs after the run and saw
> > the following sequences
> > of messages:
> >
> > [ERROR] RemoteCache - -Disabling remote cache due
> to
> > error Failed to put
> > 195717::147343 to 1862_object
> > IDToAttributes
> > [ERROR] RemoteCache - -Disabling remote cache due
> to
> > error Failed to put
> > 159971247 to 1862_ProductIDTo
> > Promotions
> > [ERROR] RemoteCache -
> -java.rmi.UnmarshalException:
> > Error unmarshaling
> > return header: java.io.Interrup
> > tedIOException: Read timed out
> > [ERROR] RemoteCache - -Disabling remote cache due
> to
> > error Failed to get
> > 194388175 from 1862_contentID
> > ToContent
> > [ERROR] RemoteCache -
> -java.rmi.UnmarshalException:
> > Error unmarshaling
> > return header: java.io.Interrup
> > tedIOException: Read timed out
> > [ERROR] RemoteCache -
> -java.rmi.UnmarshalException:
> > Error unmarshaling
> > return header: java.io.Interrup
> > tedIOException: Read timed out
> > [WARN] RemoteCacheFailoverRunner - -Failed to
> > reconnect to primary server.
> > Cache failover runner is go
> > ing to sleep for 20000 milliseconds.
> > [WARN] RemoteCacheFailoverRunner - -Failed to
> > reconnect to primary server.
> > Cache failover runner is go
> > ing to sleep for 20000 milliseconds.
> > [WARN] RemoteCacheFailoverRunner - -Failed to
> > reconnect to primary server.
> > Cache failover runner is go
> > ing to sleep for 20000 milliseconds.
> >
> > There were three GC's on the remote caching server
> > (3.5 sec, 10.5 sec. and
> > 11.2 sec) that coincide with the 3 occasions of
> this
> > type of error message.
> >
> > Paul
> >
> >
> >
> >
> >              Paul.Lewandowski@
> >
> >              kohls.com
> >
> >
> >                    To
> >              06/27/2006 11:14          "JCS Users
> > List"
> >              AM
> > <jcs-users@jakarta.apache.org>
> >
> >                    cc
> >
> >
> >              Please respond to
> >               Subject
> >              "JCS Users List"          Timeout for
> > determining a remote
> >              <jcs-users@jakart         cache is
> > down, and to reconnect
> >                a.apache.org>
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > In my testing I have occasionally seen my
> > application (client side)
> > determine that the remote cahce server is down
> when
> > in fact it is not.
> > Also, it then enters a retry cycle of 20000
> > milliseconds before
> > reconnecting.
> >
> > What I believe may be happening is that because we
> > are using a 2 gig remote
> > cache when a large GC cycle occurs perhaps the
> > client side thinks that the
> > remote server is down depending on what the value
> is
> > that determines this.
> > Can you please explain this algorithm and if it
> can
> > be user configurable?
> >
> > Also, 20000 milliseconds is too long for a retry
> > under heavy production
> > loads.  Is this retry interval also configurable?
> 
=== message truncated ===


---------------------------------------------------------------------
To unsubscribe, e-mail: jcs-users-unsubscribe@jakarta.apache.org
For additional commands, e-mail: jcs-users-help@jakarta.apache.org


Mime
View raw message