db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Knut Anders Hatlen (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DERBY-5416) SYSCS_COMPRESS_TABLE causes an OutOfMemoryError when the heap is full at call time and then gets mostly garbage collected later on
Date Mon, 09 Dec 2013 15:01:18 GMT

    [ https://issues.apache.org/jira/browse/DERBY-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13843219#comment-13843219
] 

Knut Anders Hatlen commented on DERBY-5416:
-------------------------------------------

The code that decides whether or not to grow the sort buffer, essentially works like this
in the failing case:

- When the sort buffer is initialized, it records the amount of memory currently in use, and
allocates a small buffer.

- When the buffer is full, it checks the amount of memory currently in use. It intends to
use the difference between the current usage and the initial usage as an estimate of how much
memory a doubling of the sort buffer requires. However, since a gc has happened, the difference
is negative. Since there is more memory available now than when the buffer was initialized,
it assumes that it is safe to allocate as much extra space now as the amount that it successfully
allocated with less available memory. So it doubles the buffer size. This sounds like a fair
assumption.

- The next time the buffer is full, it still sees that the memory usage is smaller than the
initial memory usage. Again it assumes that it is safe to double the buffer size, and does
exactly that. However, at this point, the assumption is not as fair. Notice the difference
between the assumption in this step and in the previous step: In the previous step, it was
assumed safe to grow the buffer with as much space as we added when the buffer was initialized.
In this step, we don't grow the buffer by the same amount as we initially gave the buffer;
we actually grow it by twice that amount. This step is repeated each time the buffer gets
full, and each time the amount we add gets doubled (way beyond the initial amount that we
regarded as a safe increment). Eventually, the buffer gets too large for the heap, and we
get an OOME.

I see at least three ways we could improve the heuristic to avoid this problem:

1. Instead of using the difference between the current memory usage and the initial memory
usage for estimating the memory requirements, we could use the difference between the current
memory usage and the memory usage the previous time the buffer was doubled. Then a big gc
right after the allocation of the buffer won't affect all upcoming estimates, only the estimate
calculated the first time the buffer is full.

2. When we don't have an estimate of the memory requirement for doubling the buffer (because
of a gc), and the current memory usage is smaller than the initial memory usage, don't assume
blindly that it is OK to double the buffer. Instead, grow it by the amount of memory that
we found it was safe to add initially, when the memory usage was at least as high as it is
now. This would mean a doubling of the buffer the first time the buffer gets full, but less
than that from the second time the buffer gets full. (In the common case, where we do have
an estimate of the memory usage, a doubling will happen each time the buffer gets full, as
long as the estimate suggests there's enough free heap space.) In other words, use a more
conservative approach and grow the buffer more slowly when we don't have a good estimate for
the actual memory requirements.

3. Since the buffer contains arrays of DataValueDescriptors, we may be able to estimate the
memory requirements the same way as we do for BackingStoreHashtable. That is, by calling estimateMemoryUsage()
on the DataValueDescriptors to see approximately how much space a single row takes. (Currently,
this approach underestimates the actual memory requirements. See DERBY-4620.)

> SYSCS_COMPRESS_TABLE causes an OutOfMemoryError when the heap is full at call time and
then gets mostly garbage collected later on
> ----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: DERBY-5416
>                 URL: https://issues.apache.org/jira/browse/DERBY-5416
>             Project: Derby
>          Issue Type: Bug
>          Components: Store
>    Affects Versions: 10.6.2.1, 10.7.1.1, 10.8.1.2
>            Reporter: Ramin Baradari
>            Priority: Critical
>              Labels: derby_triage10_9
>         Attachments: compress_test_5416.patch
>
>
> When compressing a table with an index that is larger than the maximum heap size and
therefore cannot be hold in memory as a whole an OutOfMemoryError can occur. 
> For this to happen the heap usage must be close to the maximum heap size at the start
of the index recreation and then while the entries are sorted a garbage collection run must
clean out most of the heap. This can happen because a concurrent process releases a huge chunk
of memory or just because the buffer of a previous table compression has not yet been garbage
collected. 
> The internally used heuristics to guess when more memory can be used for the merge inserter
estimates that more memory is available and then the sort buffer gets doubled. The buffer
size gets doubled until the heap usage is back to the level when the merge inserter was first
initialized or when the OOM occurs.
> The problem lies in MergeInsert.insert(...). The check if the buffer can be doubled contains
the expression "estimatedMemoryUsed < 0" where estimatedMemoryUsed is the difference in
current heap usage and heap usage at initialization. Unfortunately, in the aforementioned
scenario this will be true until the heap usage will reach close to maximum heap size before
doubling the buffer size will be stopped.
> I've tested it with 10.6.2.1, 10.7.1.1 and 10.8.1.2 but the actual bug most likely exists
in prior versions too.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

Mime
View raw message