Return-Path: X-Original-To: apmail-accumulo-user-archive@www.apache.org Delivered-To: apmail-accumulo-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 4522510D4D for ; Mon, 14 Apr 2014 17:44:03 +0000 (UTC) Received: (qmail 7779 invoked by uid 500); 14 Apr 2014 17:44:02 -0000 Delivered-To: apmail-accumulo-user-archive@accumulo.apache.org Received: (qmail 7732 invoked by uid 500); 14 Apr 2014 17:44:02 -0000 Mailing-List: contact user-help@accumulo.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@accumulo.apache.org Delivered-To: mailing list user@accumulo.apache.org Received: (qmail 7724 invoked by uid 99); 14 Apr 2014 17:44:02 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 14 Apr 2014 17:44:01 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of josh.elser@gmail.com designates 209.85.216.169 as permitted sender) Received: from [209.85.216.169] (HELO mail-qc0-f169.google.com) (209.85.216.169) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 14 Apr 2014 17:43:56 +0000 Received: by mail-qc0-f169.google.com with SMTP id i17so9287152qcy.28 for ; Mon, 14 Apr 2014 10:43:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=message-id:date:from:user-agent:mime-version:to:subject:references :in-reply-to:content-type:content-transfer-encoding; bh=+CFSsY4hUgHLa5FlDkk/npZCRbp0XkolCLkRYuGU5Jc=; b=RO6Krktc6PtICaOHH5VNYKXAgBNs6rwIRi3d+mF+J/71T12YefVoWMlMDcm9covb1I ZviDjmYc+Pn/ZjQoYwy2EtLTLXmlbdIb3GGwroIEpvSqVecnuj6H4ljuuLWu1bHBAW1D GlR5CCj1afJM9KC8QbNkCJ0HLcpJiFjGAfiFF5C8Q6o15VCBGLUxVM3laKd1D+kfCgYF KTth5JSsbFdHSK5lkU19kOTJLhnUBSaDjy1nqaXQAuX017IiMN7Ust9ZWQLfpXl2wzUV MhC/H5z83M/FY/R7h6MYGvSzv3uY/ItWal7hZGVuOH3k1GVPP0FpLlN0sh5l11z36WOd nOHA== X-Received: by 10.229.17.69 with SMTP id r5mr53238974qca.7.1397497414189; Mon, 14 Apr 2014 10:43:34 -0700 (PDT) Received: from HW10447.local (50-201-151-83-static.hfc.comcastbusiness.net. [50.201.151.83]) by mx.google.com with ESMTPSA id z10sm32479636qaf.33.2014.04.14.10.43.33 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Mon, 14 Apr 2014 10:43:33 -0700 (PDT) Message-ID: <534C1E44.4000607@gmail.com> Date: Mon, 14 Apr 2014 13:43:32 -0400 From: Josh Elser User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:24.0) Gecko/20100101 Thunderbird/24.4.0 MIME-Version: 1.0 To: user@accumulo.apache.org Subject: Re: Optimal # proxy servers References: <534C1D1A.8030609@gmail.com> In-Reply-To: <534C1D1A.8030609@gmail.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org Hrm. 10x may have been overstating too. 5x is probably more accurate. YMMV :) On 4/14/14, 1:38 PM, Josh Elser wrote: > If you can about maximizing your throughput, ingest is probably not > desirable through the proxy (you can probably get ~10x faster using the > Java BatchWriter API). > > I wouldn't avoid the proxy server purely because of using batch_scans > though. If you look at the Java impl of the BatchScanner, it essentially > keeps a queue which many servers are concurrently throwing results onto > and providing a Java Iterator to that queue to the client. With this in > mind, this is very similar to what the proxy server is doing for you. > > On 4/14/14, 12:12 PM, David O'Gwynn wrote: >> Ah, thanks Eric, that answers my question. It sounds like using the >> proxy server for batch_scans and ingest is a bit beyond its scope. Are >> there plans for beefing up the proxy to handle a wider range of >> purposes from multiple clients? >> >> Thanks, >> David >> >> On Mon, Apr 14, 2014 at 11:06 AM, Eric Newton >> wrote: >>> High ingest and batch scans use resources within the proxy for queuing >>> data. If I was using a proxy for these activities, I would want to >>> have a proxy for each client. Administrative requests, and even basic >>> single-range scans are simple pass-throughs with a much lower chance >>> of overloading the proxy. >>> >>> >>> On Mon, Apr 14, 2014 at 9:56 AM, David Medinets >>> wrote: >>>> "number of proxy servers should be proportional to the number of >>>> clients" - >>>> I hate to be pedantic but >>>> this is a very general statement. Can you be more specific? Should the >>>> proportion be 1:1 or 5:1? What factors affect the ratio? >>>> >>>> >>>> On Mon, Apr 14, 2014 at 9:32 AM, Eric Newton >>>> wrote: >>>>> >>>>> The number of proxy servers should be proportional to the number of >>>>> clients. >>>>> >>>>> The proxy can talk to all the tablet servers, but the client of the >>>>> proxy only has the proxy to make requests on its behalf. >>>>> >>>>> As always, it's going to depend on what you want to do, what your >>>>> schema looks like, and the total number of servers you have. >>>>> >>>>> -Eric >>>>> >>>>> On Sun, Apr 13, 2014 at 11:58 PM, David O'Gwynn >>>>> wrote: >>>>>> Hi community, >>>>>> >>>>>> I was reading a thread "Error stressing with pyaccumulo app" from >>>>>> February, and the topic of optimal number of proxy servers for a >>>>>> cluster of a given size came up. Does anyone have any insight into >>>>>> that question? Is there a thread in the archive that addresses this >>>>>> question directly? >>>>>> >>>>>> My gut tells me that you should have a number proportional to the >>>>>> number of tablet servers, but I'm afraid I don't really understand >>>>>> what the proxy server is doing. >>>> >>>>