accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Keith Turner <>
Subject Re: ScannerIterator thread use
Date Wed, 02 Nov 2011 14:43:08 GMT
On Wed, Nov 2, 2011 at 10:18 AM, Keith Massey
<> wrote:
> On 11/1/11 9:53 PM, Keith Turner wrote:
>> On Tue, Nov 1, 2011 at 6:12 PM, Keith Massey
>> <>  wrote:
>>> I'm not incredibly familiar with this code, but it could be a static
>>> thread
>>> pool right? And just let all ScannerIterators share some configurable
>>> thread
>>> pool? The thread would just be returned to the pool when the Reader
>>> completed.
>> When I think of thread pools, I always think of setting an upper bound
>> on the number of threads.  It occurred to me that we could use a
>> static thread pool if it were unbounded.  This would replicate the
>> current behavior and allow for thread reuse.  So make the core size
>> small (0,1 or 2),  the max size MAX_INT, the timeout small (few
>> seconds), and use a SynchronousQueue.  Everything added to the pool
>> should create a new thread if one is not available.  Also make the
>> threads daemon threads so they do not keep the process alive.
> I think that would actually be much better than replicating the current
> behavior -- most of those threads seem to be very short-lived and we seem to
> get into trouble because the garbage collector is not reclaiming them fast
> enough (and I'm guessing we're bumping up against our ulimit). An unbounded
> pool would probably stay relatively small in most cases. Having the option
> of passing in a bounded thread pool would be nice though. If we have
> hundreds of users querying accumulo at once we'll probably need some way to
> bound the number of threads so we don't crash our server (although I guess
> we could do that in our code that calls accumulo).

Ok,  I will create a ticket.  One thing you could do w/ the current
code is increase the batch size on the scanner.  I think it is 1000 by
default.  After the scanner reads a few batches it starts kicking off
the read ahead thread to read batches.  Since a thread is created per
batch increasing the batch size will decrease the frequency of thread
creation by the scanner.  You could try 2000, 4000, or 8000.


View raw message