Return-Path: X-Original-To: apmail-incubator-accumulo-dev-archive@minotaur.apache.org Delivered-To: apmail-incubator-accumulo-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 5D0A29E4A for ; Wed, 2 Nov 2011 14:43:36 +0000 (UTC) Received: (qmail 4775 invoked by uid 500); 2 Nov 2011 14:43:36 -0000 Delivered-To: apmail-incubator-accumulo-dev-archive@incubator.apache.org Received: (qmail 4719 invoked by uid 500); 2 Nov 2011 14:43:36 -0000 Mailing-List: contact accumulo-dev-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: accumulo-dev@incubator.apache.org Delivered-To: mailing list accumulo-dev@incubator.apache.org Received: (qmail 4711 invoked by uid 99); 2 Nov 2011 14:43:36 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 02 Nov 2011 14:43:36 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [209.85.161.47] (HELO mail-fx0-f47.google.com) (209.85.161.47) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 02 Nov 2011 14:43:29 +0000 Received: by faas16 with SMTP id s16so521188faa.6 for ; Wed, 02 Nov 2011 07:43:09 -0700 (PDT) MIME-Version: 1.0 Received: by 10.223.91.143 with SMTP id n15mr8909851fam.23.1320244988856; Wed, 02 Nov 2011 07:43:08 -0700 (PDT) Received: by 10.223.83.3 with HTTP; Wed, 2 Nov 2011 07:43:08 -0700 (PDT) In-Reply-To: <4EB15129.4090101@digitalreasoning.com> References: <4EB04FE3.5020005@digitalreasoning.com> <4EB06EB6.1090004@digitalreasoning.com> <4EB15129.4090101@digitalreasoning.com> Date: Wed, 2 Nov 2011 10:43:08 -0400 Message-ID: Subject: Re: ScannerIterator thread use From: Keith Turner To: accumulo-dev@incubator.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org On Wed, Nov 2, 2011 at 10:18 AM, Keith Massey wrote: > On 11/1/11 9:53 PM, Keith Turner wrote: >> >> On Tue, Nov 1, 2011 at 6:12 PM, Keith Massey >> =A0wrote: >>> >>> I'm not incredibly familiar with this code, but it could be a static >>> thread >>> pool right? And just let all ScannerIterators share some configurable >>> thread >>> pool? The thread would just be returned to the pool when the Reader >>> completed. >>> >> When I think of thread pools, I always think of setting an upper bound >> on the number of threads. =A0It occurred to me that we could use a >> static thread pool if it were unbounded. =A0This would replicate the >> current behavior and allow for thread reuse. =A0So make the core size >> small (0,1 or 2), =A0the max size MAX_INT, the timeout small (few >> seconds), and use a SynchronousQueue. =A0Everything added to the pool >> should create a new thread if one is not available. =A0Also make the >> threads daemon threads so they do not keep the process alive. > > I think that would actually be much better than replicating the current > behavior -- most of those threads seem to be very short-lived and we seem= to > get into trouble because the garbage collector is not reclaiming them fas= t > enough (and I'm guessing we're bumping up against our ulimit). An unbound= ed > pool would probably stay relatively small in most cases. Having the optio= n > of passing in a bounded thread pool would be nice though. If we have > hundreds of users querying accumulo at once we'll probably need some way = to > bound the number of threads so we don't crash our server (although I gues= s > we could do that in our code that calls accumulo). > Ok, I will create a ticket. One thing you could do w/ the current code is increase the batch size on the scanner. I think it is 1000 by default. After the scanner reads a few batches it starts kicking off the read ahead thread to read batches. Since a thread is created per batch increasing the batch size will decrease the frequency of thread creation by the scanner. You could try 2000, 4000, or 8000. Keith