accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Slater, David M." <>
Subject RE: WALOG Design
Date Thu, 26 Sep 2013 15:04:14 GMT
Awesome, thanks!

For clarification, when the in memory map is full, if it contains data for multiple tablets,
how does it prioritize which tablets to do minor compactions for? Is there a separate in memory
map for each tablet on a tablet server? When the in memory map is full, will it do minor compactions
until all of the data currently in it is empty, or will it trigger a smaller number of minor
compactions until it is at a more reasonable size? 

It is unclear to me if you can change the minimum size of rfiles written by minor compactions.

Also, how does the in memory map handle mutations for tablets that are currently doing a minor


-----Original Message-----
From: Christopher [] 
Sent: Wednesday, September 25, 2013 4:49 PM
To: Accumulo User List
Subject: Re: WALOG Design

Mutations are written to WALOGS when they are inserted into a TServer's in-memory map. The
TServer's in-memory map gets flushed to disk periodically, but there's a risk that the TServer
will die after the data has been ingested, but before it is flushed to disk. The WALOGS, when
enabled, protect against this data loss, by first writing out incoming data to a WALOG. The
WALOG is more efficient than creating RFiles, because it does not contain sorted data or indexes.
It's just a playback file, so that in case of a failure, Mutations that the client believed
had been ingested, aren't lost.

Putting the WALOG in memory defeats the purpose of the WALOG, but it can be disabled (per-table),
if you care more about performance than protection against data loss. Don't disable it for
the !METADATA table, though...

You can generate RFiles directly (perhaps using a M/R job), and bypass the WALOG, and bulk
import them into Accumulo.

Christopher L Tubbs II

On Wed, Sep 25, 2013 at 4:39 PM, Slater, David M.
<> wrote:
> First, thank you all for the responses on my BatchWriter question, as 
> I was able to increase my ingestion rate by a large factor. I am now 
> hitting disk i/o limits, which is forcing me to look at reducing file 
> copying. My primary thoughts concerning this are reducing the hadoop 
> replication factor as well as reducing the number of major compactions.
> However, from what I understand about write ahead logs (in 1.4), even 
> if you remove all major compactions, all data will essentially be 
> written to disk
> twice: once to the WALOG in the local directory (HDFS is 1.5), then 
> from the WALOG to an RFile on HDFS. Is this understanding correct?
> I’m trying to understand what the primary reasons are for having the WALOG.
> Is there any way to write directly to an RFile from the In-Memory Map 
> (or have the WALOG in memory)?
> Thanks,
> David
View raw message