accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Keith Turner <ke...@deenlo.com>
Subject Re: ScannerIterator thread use
Date Wed, 02 Nov 2011 14:43:08 GMT
On Wed, Nov 2, 2011 at 10:18 AM, Keith Massey
<keith.massey@digitalreasoning.com> wrote:
> On 11/1/11 9:53 PM, Keith Turner wrote:
>>
>> On Tue, Nov 1, 2011 at 6:12 PM, Keith Massey
>> <keith.massey@digitalreasoning.com>  wrote:
>>>
>>> I'm not incredibly familiar with this code, but it could be a static
>>> thread
>>> pool right? And just let all ScannerIterators share some configurable
>>> thread
>>> pool? The thread would just be returned to the pool when the Reader
>>> completed.
>>>
>> When I think of thread pools, I always think of setting an upper bound
>> on the number of threads.  It occurred to me that we could use a
>> static thread pool if it were unbounded.  This would replicate the
>> current behavior and allow for thread reuse.  So make the core size
>> small (0,1 or 2),  the max size MAX_INT, the timeout small (few
>> seconds), and use a SynchronousQueue.  Everything added to the pool
>> should create a new thread if one is not available.  Also make the
>> threads daemon threads so they do not keep the process alive.
>
> I think that would actually be much better than replicating the current
> behavior -- most of those threads seem to be very short-lived and we seem to
> get into trouble because the garbage collector is not reclaiming them fast
> enough (and I'm guessing we're bumping up against our ulimit). An unbounded
> pool would probably stay relatively small in most cases. Having the option
> of passing in a bounded thread pool would be nice though. If we have
> hundreds of users querying accumulo at once we'll probably need some way to
> bound the number of threads so we don't crash our server (although I guess
> we could do that in our code that calls accumulo).
>

Ok,  I will create a ticket.  One thing you could do w/ the current
code is increase the batch size on the scanner.  I think it is 1000 by
default.  After the scanner reads a few batches it starts kicking off
the read ahead thread to read batches.  Since a thread is created per
batch increasing the batch size will decrease the frequency of thread
creation by the scanner.  You could try 2000, 4000, or 8000.

Keith

Mime
View raw message