accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anthony Fox <adfaccu...@gmail.com>
Subject Re: tservers running out of heap space
Date Thu, 29 Nov 2012 20:50:26 GMT
Ok, a bit more info.  I set -XX:+HeapDumpOnOutOfMemoryError and took a look
at the heap dump.  The thread that caused the OOM is reading a column
family bloom filter from the CacheableBlockFile.  The class taking up the
memory is long[] which seems to be consistent with a bloom filter.  Does
this sound right?  Any guidance on settings to tweak related to bloom
filters to alleviate this issue?


On Thu, Nov 29, 2012 at 2:24 PM, Anthony Fox <adfaccuser@gmail.com> wrote:

> Since the scan involves an intersecting iterator, it has to scan the
> entire row range.  Also, it's not even very many concurrent clients -
> between 5 and 10.  Should I turn compression off on this table or is that
> bad idea in general?
>
>
> On Thu, Nov 29, 2012 at 2:22 PM, Keith Turner <keith@deenlo.com> wrote:
>
>>
>>
>> On Thu, Nov 29, 2012 at 2:09 PM, Anthony Fox <adfaccuser@gmail.com>wrote:
>>
>>> We're not on 1.4 yet, unfortunately.  Are there any config params I can
>>> tweak to manipulate the compressor pool?
>>
>>
>> Not that I know of, but its been a while since I looked at that.
>>
>>
>>>
>>>
>>> On Thu, Nov 29, 2012 at 1:49 PM, Keith Turner <keith@deenlo.com> wrote:
>>>
>>>>
>>>>
>>>> On Thu, Nov 29, 2012 at 12:20 PM, Anthony Fox <adfaccuser@gmail.com>wrote:
>>>>
>>>>> Compacting down to a single file is not feasible - there's about 70G
>>>>> in 255 tablets across 15 tablet servers.  Is there another way to tune
the
>>>>> compressor pool or another mechanism to verify that this is the issue?
>>>>
>>>>
>>>> I suppose another way to test this would be to run a lot of concurrent
>>>> scans, but not enough to kill the tserver.  Then get a heap dump of the
>>>> tserver and see if it contains a lot of 128k or 256k (can not remember
>>>> exact size) byte arrays that are referenced by the compressor pool.
>>>>
>>>>
>>>>>
>>>>>
>>>>> On Thu, Nov 29, 2012 at 12:09 PM, Keith Turner <keith@deenlo.com>wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, Nov 29, 2012 at 11:14 AM, Anthony Fox <adfaccuser@gmail.com>wrote:
>>>>>>
>>>>>>> I am experiencing some issues running multiple parallel scans
>>>>>>> against Accumulo.  Running single scans works just fine but when
I ramp up
>>>>>>> the number of simultaneous clients, my tablet servers die due
to running
>>>>>>> out of heap space.  I've tried raising max heap to 4G which should
be more
>>>>>>> than enough but I still see this error.  I've tried with
>>>>>>> table.cache.block.enable=false
>>>>>>> table.cache.index.enable=false, and table.scan.cache.enable=false
>>>>>>> and all combinations of caching enabled as well.
>>>>>>>
>>>>>>> My scans involve a custom intersecting iterator that maintains
no
>>>>>>> more state than the top key and value.  The scans also do a bit
of
>>>>>>> aggregation on column qualifiers but the result is small and
the number of
>>>>>>> returned entries is only in the dozens.  The size of each returned
value is
>>>>>>> only around 500 bytes.
>>>>>>>
>>>>>>> Any ideas why this may be happening or where to look for further
>>>>>>> info?
>>>>>>>
>>>>>>
>>>>>> One know issues is hadoops compressor pool.  If you have a tablet
>>>>>> with 8 files and you query 10 terms, you will allocate 80 decompressors.
>>>>>> Each decompressor uses 128K.   If you have 10 concurrent queries,
10 terms,
>>>>>> and 10 files then you will allocate 1000 decompressors.    These
>>>>>> decompressors come from a pool that never shrinks.  So if you allocate
1000
>>>>>> at the same time, they will stay around.
>>>>>>
>>>>>> Try compacting your table down to one file and rerun your query just
>>>>>> to see if that helps.   If it does, then thats an important clue.
>>>>>>
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Anthony
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message