accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eric Newton (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ACCUMULO-1770) out of memory error on very long running tablet server
Date Tue, 15 Oct 2013 20:14:42 GMT

    [ https://issues.apache.org/jira/browse/ACCUMULO-1770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13795580#comment-13795580
] 

Eric Newton commented on ACCUMULO-1770:
---------------------------------------

After noticing this little gem in the mallopt(3) man page:

{quote}
 there are some disadvantages to the use of mmap(2):
deallocated space is not placed on the free list for reuse by
later allocations
{quote}

Oh.  After pouring over the man page, I decide to set the following two environment variables
in an attempt to get {{malloc}} to not allocate memory using {{mmap}}.

{noformat}
export MALLOC_MMAP_MAX_=0
export MALLOC_MMAP_THRESHOLD_=33554432
{noformat}

The goal of the first setting is to turn off {{mmap}} allocation and if that fails, to raise
the threshold at which {{mmap}} is used to be larger than any of our memory allocations.

I re-ran the FragmentTest with values 50,000 to 100,000 bytes. I also left the WALog on and
added a 4-hour age off. I've attached the graph, which shows the process stays under the 4G
limit.

Updated test program attached, too.


> out of memory error on very long running tablet server
> ------------------------------------------------------
>
>                 Key: ACCUMULO-1770
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-1770
>             Project: Accumulo
>          Issue Type: Bug
>          Components: tserver
>            Reporter: Eric Newton
>            Assignee: Eric Newton
>         Attachments: FragmentTest.java, javamap.png, memory-usage.png, nativemap.png,
three-day-tserver.png
>
>
> On a large cluster it was noticed that a few of the tablet servers had been pushed into
swap.  This didn't effect the performance of the server until it ran out of memory, and the
process was killed.  The gc reports in the debug log showed the system had plenty of heap
space for the JVM.  The number of threads in the server were not excessive (dozens).  This
cluster ingests some large values (megabytes).  The tablet server had been up for a month
prior to running out of memory.  MALLOC_ARENA_MAX had already been set to 1.
> * Investigate the effect of fragmentation on memory usage for large value inserts.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message