accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Terry P." <>
Subject Re: How to reduce number of entries in memory
Date Tue, 29 Oct 2013 16:28:24 GMT
Hi Josh,
Thanks for the advice.  I am of course concerned about the nodes dropping
out of the cluster, but we're in a position where we do not provide the
infrastructure and thus have no control over it.  Despite the
infrastructure having multiple points of redundancy, the network glitch
still happened and the nodes were evicted by the Accumulo Master.  So since
we've seen it happen once, I'm going to assume it is likely to happen again
someday and want to do whatever I can within my realm of control to help
ensure if/when it does happen, we can recover from it and keep on chugging.

I like the idea of upping the tserver.logger.count from 2 to 3, as we are
indeed stuck on v1.4.2 for the foreseeable future. More insurance is good.

What are your thoughts on doing an hourly flush of the table in the shell
to ensure entries are flushed to disk more frequently to help minimize the
replay required if connectivity to a node is lost?

On Mon, Oct 28, 2013 at 5:50 PM, Josh Elser <> wrote:

> It kind of sounds like you should be more concerned about nodes randomly
> dropping out of your cluster :)
> If you're stuck on 1.4 series, you can try to up the property
> 'tserver.logger.count' to '3' instead of the default of '2' to ensure that
> you have a greater chance of not losing a WAL replica.
> With 1.5, you'll get your HDFS replication (which is likely 3, as well).
> I'm not sure off the top of my head if the 1.4 loggers have an rack
> locality for nodes (e.g. one replicas on rack, and one off rack).
> Regardless, trying to avoid network glitches is likely the better
> approach. If a tablet has any WAL file (regardless of the amount of data
> stored in it), you're still going to have to recovery/replay that WAL
> before the tablet can come back online in the event of a failure happening.
> On 10/28/13, 6:20 PM, Terry P. wrote:
>> Thanks for the replies. I was approaching it from a data integrity
>> perspective, as in wanting it flushed to disk in case of a TabletServer
>> failure.  Last weekend we saw two TabletServers exit the cluster due to
>> a network glitch, and wouldn't you know that the 04 node was secondary
>> logger for the 03 node.
>> In our case, these entries are hanging around in memory /for hours/, as
>> the ingest rate is not that high.
>> Perhaps an hourly flush of the table via the shell to get it out to disk
>> would be the way to go?
>> On Mon, Oct 28, 2013 at 4:30 PM, Mike Drob <
>> <>> wrote:
>>     What are you trying to accomplish by reducing the number of entries
>>     in memory? A tablet server will not minor compact (flush) until the
>>     native map fills up, but keeping things in memory isn't really a
>>     performance concern.
>>     You can force a one-time minor compaction via the shell using the
>>     'flush' command.
>>     On Mon, Oct 28, 2013 at 5:19 PM, Terry P. <
>>     <>> wrote:
>>         Greetings all,
>>         For a growing table that currently from zero to 70 million
>>         entries this weekend, I'm seeing 4.4 million entries still in
>>         memory, though the client programs are supposed to be flushing
>>         their entries.
>>         Is there a server-side setting to help reduce the number of
>>         entries that are in memory (not yet flushed to disk)?  Our
>>         system has fairly light performance requirements, so I'm okay if
>>         a tweak may result in reduced ingest performance.
>>         Thanks in advance,
>>         Terry

View raw message