Return-Path: X-Original-To: apmail-accumulo-user-archive@www.apache.org Delivered-To: apmail-accumulo-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id B11D7D37F for ; Wed, 26 Sep 2012 13:20:11 +0000 (UTC) Received: (qmail 96845 invoked by uid 500); 26 Sep 2012 13:20:11 -0000 Delivered-To: apmail-accumulo-user-archive@accumulo.apache.org Received: (qmail 96521 invoked by uid 500); 26 Sep 2012 13:20:06 -0000 Mailing-List: contact user-help@accumulo.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@accumulo.apache.org Delivered-To: mailing list user@accumulo.apache.org Received: (qmail 96487 invoked by uid 99); 26 Sep 2012 13:20:05 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 26 Sep 2012 13:20:05 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=FSL_RCVD_USER,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of ameetkini@gmail.com designates 209.85.216.41 as permitted sender) Received: from [209.85.216.41] (HELO mail-qa0-f41.google.com) (209.85.216.41) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 26 Sep 2012 13:19:59 +0000 Received: by qatp27 with SMTP id p27so3924223qat.0 for ; Wed, 26 Sep 2012 06:19:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; bh=jSppDuxR2t+m6PGXayHNywiOnugGRb+qhwwnnslNH2g=; b=xj8sy0ELUQHCsjEeFpsHmUILUzaQB81dNs3E7UNiV207c9Gc1ZeThmgR17ZiVKvNK2 ziJGYmZsXRIVat1gfp+xKA+RFTye5wxHtV4M26tErj59ULyZFzxLo0jK+igbU2uLh5UV hTkzn6DNly4n3Ab7YCTOWWgx8tiO50ma0zv06+9rY2RCFYuaNDzg45qRE0W1ooRZ9YB4 aUAdZufZrKLKKqwk592/AgVWgrmWn0SincJhB+rRR5+yBoI5O++KfmunQoXB5/ZylL+U jezp9LZWnZk9QCHJcYZIaMhgdWg3n0GK064RYj6+iok5q0y/YrvU+56i9MHjPT/h+Os5 eQnA== Received: by 10.229.111.70 with SMTP id r6mr313283qcp.120.1348665578878; Wed, 26 Sep 2012 06:19:38 -0700 (PDT) MIME-Version: 1.0 Received: by 10.49.29.135 with HTTP; Wed, 26 Sep 2012 06:19:18 -0700 (PDT) In-Reply-To: References: From: ameet kini Date: Wed, 26 Sep 2012 09:19:18 -0400 Message-ID: Subject: Re: number of query threads for batch scanner To: user@accumulo.apache.org Content-Type: multipart/alternative; boundary=00235447169cbff78b04ca9aacf0 X-Virus-Checked: Checked by ClamAV on apache.org --00235447169cbff78b04ca9aacf0 Content-Type: text/plain; charset=ISO-8859-1 So I decided to try something different, and changed my splitting policy. This ended up with more tablets per tablet server. Interestingly, this bumped up my maximum concurrent scans on that tablet server. With about 19 tablets, I was able to go up to 6 concurrent scans, which ended up using all my cores - happy! And I didn't change my numQueryThreads parameter from the already very high number. But that leaves me wondering whether the maximum number of concurrent scans on a given tablet server is related to the number of tablets hit by that scan on the tablet server. If true, that is interesting, and not what I'd expected. Given that the underlying files are immutable, I'm not sure why there can't be, say, 4 concurrent scans on 1 tablet if there were 4 cores free to host those scans. What I'm seeing, as described above, is I need to further split my tablet into > 4 tablets in order to have 4 concurrent scans. Ameet On Tue, Sep 25, 2012 at 3:23 PM, ameet kini wrote: > I should also state the not-so-obvious that my Range spans the entire > range of the four tablets in question. > > Ameet > > On Tue, Sep 25, 2012 at 3:17 PM, ameet kini wrote: > >> Thanks William. >> >> The issue here is that without knowing how the numQueryThreads translates >> to the number of concurrent scans, I cannot effectively tune that parameter >> to maximize resource usage on the tablet server. What I'm seeing is that >> even though there are four tablets on the tablet server, my number of >> concurrent scans never exceeds 3. This is despite setting numQueryThreads >> to a very high number and having 8 cores on the tablet server. I suspect >> with 3 concurrent scans and no garbage collection happening at that moment, >> most of the cores are sitting idle. >> >> Ameet >> >> On Tue, Sep 25, 2012 at 3:08 PM, William Slacum < >> wilhelm.von.cloud@accumulo.net> wrote: >> >>> It should really be dependent upon the resources available to the >>> client. You can set an arbitrarily high number of threads, but you're still >>> bound by the number of parallel operations the CPU can make. I would assume >>> the sweet spot is somewhere around that number-- try doing a small bench >>> mark with 2, 4, 8, 16, etc threads and see where your performance starts to >>> level off. >>> >>> >>> On Tue, Sep 25, 2012 at 11:45 AM, ameet kini wrote: >>> >>>> Probably worth adding that the table mentioned below has a bunch of >>>> tablets on other tablet servers as well, which is why I'm using >>>> BatchScanner. I'm just not sure how the numQueryThreads relates to the >>>> number of a concurrent scans on a given tablet server. >>>> >>>> Thanks >>>> >>>> >>>> On Tue, Sep 25, 2012 at 2:22 PM, ameet kini wrote: >>>> >>>>> >>>>> I have a table with 4 tablets on a given tablet server. Depending on >>>>> the numQueryThreads parameter below, I see a varying number of maximum >>>>> concurrent scans on that table. This maximum number varies from 1 to 3 >>>>> (i.e., some values for numQueryThreads result in maximum concurrent scan of >>>>> 1, some values result in 2 concurrent scans, etc.). Can someone shed light >>>>> on what is the relationship between numQueryThreads and number of >>>>> concurrent scans? >>>>> >>>>> public BatchScanner createBatchScanner(String tableName, >>>>> Authorizations authorizations, >>>>> int numQueryThreads) >>>>> >>>>> A follow-on question would be what is general rule of thumb for >>>>> setting numQueryThreads? Should it be set to the # of hosted tablets >>>>> expected to be consumed by that BatchScanner? Should it be the # of tablet >>>>> servers expected to be hit by that BatchScanner? Something else? >>>>> >>>>> Thanks, >>>>> Ameet >>>>> >>>>> >>>>> >>>> >>> >> > --00235447169cbff78b04ca9aacf0 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable

So I decided to try something different, and changed my= splitting policy. This ended up with more tablets per tablet server. Inter= estingly, this bumped up my maximum concurrent scans on that tablet server.= With about 19 tablets, I was able to go up to 6 concurrent scans, which en= ded up using all my cores - happy! And I didn't change my numQueryThrea= ds parameter from the already very high number.

But that leaves me wondering whether the maximum number= of concurrent scans on a given tablet server is related to the number of t= ablets hit by that scan on the tablet server. If true, that is interesting,= and not what I'd expected. Given =A0that the underlying files are immu= table, I'm not sure why there can't be, say, 4 concurrent scans on = 1 tablet if there were 4 cores free to host those scans. What I'm seein= g, as described above, is I need to further split my tablet into > 4 tab= lets in order to have 4 concurrent scans.=A0

Ameet


On Tue, Sep 25, 2012 at 3:23 PM, ameet kini = <ameetkini@gmai= l.com> wrote:
I should also state the not-so-obvious that = my Range spans the entire range of the four tablets in question.=A0

Ameet

On Tue, Sep 25, 2012 at 3:17 PM, a= meet kini <ameetkini@gmail.com> wrote:
Thanks William.

The issue here is that without knowing how the numQueryThreads translates= to the number of concurrent scans, I cannot effectively tune that paramete= r to maximize resource usage on the tablet server. What I'm seeing is t= hat even though there are four tablets on the tablet server, my number of c= oncurrent scans never exceeds 3. This is despite setting numQueryThreads to= a very high number and having 8 cores on the tablet server. I suspect with= 3 concurrent scans and no garbage collection happening at that moment, mos= t of the cores are sitting idle.=A0

Ameet

On Tue, Sep 25, 2012 at 3:08 PM, William Slacum <w= ilhelm.von.cloud@accumulo.net> wrote:
It should really be dependent upon the resou= rces available to the client. You can set an arbitrarily high number of thr= eads, but you're still bound by the number of parallel operations the C= PU can make. I would assume the sweet spot is somewhere around that number-= - try doing a small bench mark with 2, 4, 8, 16, etc threads and see where = your performance starts to level off.


On Tue, Sep 25, 2012 at 11:45 AM, ameet kini= <ameetkini@gmail.com> wrote:
Probably worth adding that the table mentioned below has a bunch of tablets= on other tablet servers as well, which is why I'm using BatchScanner. = I'm just not sure how the numQueryThreads relates to the number of a co= ncurrent scans on a given tablet server.

Thanks


On Tue, S= ep 25, 2012 at 2:22 PM, ameet kini <ameetkini@gmail.com> w= rote:

I have a table with 4 tablets on a given tablet server.= Depending on the numQueryThreads parameter below, I see a varying number o= f maximum concurrent scans on that table. This maximum number varies from 1= to 3 (i.e., some values for numQueryThreads result in maximum concurrent s= can of 1, some values result in 2 concurrent scans, etc.). Can someone shed= light on what is the relationship between numQueryThreads and number of co= ncurrent scans? =A0

public BatchScanner createBatchScanner(String tabl= eName,
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 =A0 =A0 =A0Authorizations authorizations,
=A0 =A0 =A0= =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0int num= QueryThreads)

A follow-on question would be what is general rul= e of thumb for setting numQueryThreads? Should it be set to the =A0# of hos= ted tablets expected to be consumed by that BatchScanner? Should it be the = # of tablet servers expected to be hit by that BatchScanner? Something else= ?

Thanks,
Ameet







--00235447169cbff78b04ca9aacf0--