accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <josh.el...@gmail.com>
Subject Re: How to reduce number of entries in memory
Date Tue, 29 Oct 2013 16:46:25 GMT
On 10/29/13, 12:28 PM, Terry P. wrote:
>
> What are your thoughts on doing an hourly flush of the table in the
> shell to ensure entries are flushed to disk more frequently to help
> minimize the replay required if connectivity to a node is lost?

If you want to go the route of flushing more frequently, I would 
probably suggest dropping the configuration for tserver.walog.max.size 
from the default of 1G to something else (maybe 256M or 512M?).

My gut is telling me that this still isn't going to help you in the end. 
What does the distribution on your ingest look like?

Looking back at some old emails from you, if you're ingesting UUIDs as 
the row key, most likely you're ingesting to a "small" amount of data to 
many servers. If this is the case, it's more likely that you're just 
playing the odds as to whether you happen to catch a flush the exact 
moment before you lose the N servers that contained your WALs.

Increasing the WAL replication is likely the best solution you can get 
for yourself. Hoping that your failures only occur after a flush but 
before you ingest more data seems unlikely to happen. If you still want 
data flushed more often, reducing the WAL size will be automatic over 
your manual cron job to flush the table (one less thing to manage).

And, as you likely know, this would all be at the expense of ingest 
performance.

Mime
View raw message