accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Keith Turner <>
Subject Re: ScannerIterator thread use
Date Tue, 01 Nov 2011 22:35:00 GMT
On Tue, Nov 1, 2011 at 6:12 PM, Keith Massey
<> wrote:
> On 11/1/11 3:31 PM, Keith Turner wrote:
>> On Tue, Nov 1, 2011 at 4:00 PM, Keith Massey
>> <>  wrote:
>>> We're querying accumulo through a web application. After it had been hit
>>> with one of our test scripts for a few minutes with the debugger attached
>>> I
>>> noticed that there were hundreds and hundreds of threads being garbage
>>> collected. Eventually it crashes my IDE and the server becomes
>>> unresponsive.
>>> The server recovers eventually. After looking through the code a little
>>> bit,
>>> it appears that these threads are coming from
>>> org.apache.accumulo.core.client.impl.ScannerIterator.initiateReadAhead().
>>> We
>>> actually get many threads per iterator. Is there any reason that it can't
>>> use a thread pool instead of creating a new thread for every call to that
>>> method?
>>> Thanks.
>>> Keith
>> The reason I did not use a thread pool is because the scanner does not
>> have a close method.  I suppose we could use a thread pool where the
>> threads timeout when not used.  This could still lead to a lot of
>> threads depending on the timeout and how many scanner iterators are
>> created.
>> The BatchScanner and BatchWriter interfaces use thread pools and have
>> close methods.
>> Do you think this issue needs a ticket?
> I'm not incredibly familiar with this code, but it could be a static thread
> pool right? And just let all ScannerIterators share some configurable thread
> pool? The thread would just be returned to the pool when the Reader
> completed.

A static thread pool may limit a users ability to control the behavior
when they have multiple threads using scanners.  Along that line of
thought, letting the user pass in a thread pool is a flexible
solution.  It gives the user a lot of control.  The scanner factory
method could accept a thread pool as an argument.  The drawback is
that it makes it more cumbersome for the user when they are doing
something simple.

View raw message