accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <josh.el...@gmail.com>
Subject Re: Fwd: why compaction failure on one table brings other tables offline, how to recover
Date Tue, 12 Apr 2016 15:11:27 GMT
Jayesh Patel wrote:
> Josh, The OOM tserver process was killed by the kernel, it didn't hang
> around.  I tried restarting it manually, but it ran out of memory right
> away and was killed again leaving the tablet offline.  It must have a
> huge "recovery" log to go through.  HDFS
> /accumulo/wal/instance-accumulo+9997/24e08581-a081-4b41-afc5-d75bdda6cf15 is
> about 42MB, and machine has about 300MB free and apparently not enough
> for tserver.
>

Ok, cool. If you're that constrained on resources, you can also try 
reducing the property tserver.sort.buffer.size in accumulo-site.xml. It 
defaults to 200M, you could try 25M or 50M instead.

This is a buffer size that is used for sorting log edits during the 
recovery process. This might help if you never make it through the 
recovery process.

300MB is a little low in general as far as headroom goes (especially 
when you're already not giving Accumulo enough RAM). Typically, you want 
to ensure that you give the operating system at least 1G of memory for 
itself.

Mime
View raw message