accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <>
Subject Re: walog consumes all the disk space on power failure
Date Tue, 31 May 2016 22:53:34 GMT
Hi Jayesh,

Can you quantify some rough size numbers for us? Are you seeing 
exceptions in the Accumulo tserver/master logs?

One thought is that when Accumulo creates new WAL files, it sets the 
blocksize to be 1G (as a trick to force HDFS into making some 
"non-standard" guarantees for us). As a result, it will appear that 
there are a number of very large WAL files (but they're essentially empty).

If your instance is in some situation where Accumulo is repeatedly 
failing to write to a WAL, it might think the WAL is bad, abandon it, 
and try to create a new one. If this is happening each time, I could see 
it explain the situation you described. However, you should see the 
TabletServers complaining loudly that they cannot write to the WALs.

Jayesh Patel wrote:
> We have a 3 node Accumulo 1.7 cluster running as VMWare VMs with minute
> amount of data compared to Accumulo standards.
> We have run into a situation multiple times now where all the nodes have
> a power failure and when they are trying to recover from it
> simultaneously, walog grows exponentially and fills up all the available
> disk space. We have confirmed that the walog folder under /accumulo in
> hdfs is consuming 99% of the disk space.
> We have tried freeing enough space to be able to run Accumulo processes
> in the hopes of it burning through walog without success. Walog just
> grew to take up the freed space.
> Given that we need to better manage the power situation, we’re trying to
> understand what could be causing this and if there’s anything we can do
> to avoid this situation.
> We have some heartbeat data being written to a table at a very small
> constant rate which is not sufficient to cause a such large write-ahead
> log even if HDFS was pulled from under Accumulo’s feet, so to speak
> during the power failure in case you’re wondering.
> Thank you,
> Jayesh

View raw message