accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <>
Subject Re: Tserver kills themselves from lost Zookeeper locks
Date Tue, 12 Nov 2013 19:26:34 GMT
Was there actually an 11 second delay in the tserver's debug log 
(2:00:51 to 2:01:02) or did you omit some log statements?

The log messages in your original email also showed MultiScanSession(s) 
immediately before the ZK lock lost.

Can you give us any information about the type of query workload you're 
servicing here? A MultiScanSession is the equivalent to a "piece" of a 
BatchScanner running against a tserver. Are you doing any sort of heavy 
workload in an SortedKeyValueIterator running on these tservers?

On 11/12/13, 9:36 AM, buttercream wrote:
> I increased all of the servers up to 32GB of memory and confirmed that I have
> the flags that you mentioned in the env file. Unfortunately within a day I
> lost one of the tservers. In the tserver logs, looking at the timestamps
> leading up to the event, I see:
> 02:00:03,835 [cache.LruBlockCache]
> 02:00:51,580 [tabletserver.TabletServer] DEBUG: MultiScanSess
> 02:01:02,267 [tabletserver.TabletServer] FATAL: Lost tablet server lock
> (reason = LOCK_DELETED), exiting.
> What's interesting on this one is that in the master log file, there is no
> error message at that time. What I do see is this:
> 02:01:02,168 [master.Master] DEBUG: Finished gathering information from 2
> servers in 0.01 seconds
> That would mean the tserver killed itself within milliseconds of the master
> getting the information successfully. Any thoughts on this one?
> --
> View this message in context:
> Sent from the Users mailing list archive at

View raw message