accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <>
Subject Re: Optimal # proxy servers
Date Mon, 14 Apr 2014 17:38:34 GMT
If you can about maximizing your throughput, ingest is probably not 
desirable through the proxy (you can probably get ~10x faster using the 
Java BatchWriter API).

I wouldn't avoid the proxy server purely because of using batch_scans 
though. If you look at the Java impl of the BatchScanner, it essentially 
keeps a queue which many servers are concurrently throwing results onto 
and providing a Java Iterator to that queue to the client. With this in 
mind, this is very similar to what the proxy server is doing for you.

On 4/14/14, 12:12 PM, David O'Gwynn wrote:
> Ah, thanks Eric, that answers my question. It sounds like using the
> proxy server for batch_scans and ingest is a bit beyond its scope. Are
> there plans for beefing up the proxy to handle a wider range of
> purposes from multiple clients?
> Thanks,
> David
> On Mon, Apr 14, 2014 at 11:06 AM, Eric Newton <> wrote:
>> High ingest and batch scans use resources within the proxy for queuing
>> data.  If I was using a proxy for these activities, I would want to
>> have a proxy for each client.  Administrative requests, and even basic
>> single-range scans are simple pass-throughs with a much lower chance
>> of overloading the proxy.
>> On Mon, Apr 14, 2014 at 9:56 AM, David Medinets
>> <> wrote:
>>> "number of proxy servers should be proportional to the number of clients" -
>>> I hate to be pedantic but
>>> this is a very general statement. Can you be more specific? Should the
>>> proportion be 1:1 or 5:1? What factors affect the ratio?
>>> On Mon, Apr 14, 2014 at 9:32 AM, Eric Newton <> wrote:
>>>> The number of proxy servers should be proportional to the number of
>>>> clients.
>>>> The proxy can talk to all the tablet servers, but the client of the
>>>> proxy only has the proxy to make requests on its behalf.
>>>> As always, it's going to depend on what you want to do, what your
>>>> schema looks like, and the total number of servers you have.
>>>> -Eric
>>>> On Sun, Apr 13, 2014 at 11:58 PM, David O'Gwynn <> wrote:
>>>>> Hi community,
>>>>> I was reading a thread "Error stressing with pyaccumulo app" from
>>>>> February, and the topic of optimal number of proxy servers for a
>>>>> cluster of a given size came up. Does anyone have any insight into
>>>>> that question? Is there a thread in the archive that addresses this
>>>>> question directly?
>>>>> My gut tells me that you should have a number proportional to the
>>>>> number of tablet servers, but I'm afraid I don't really understand
>>>>> what the proxy server is doing.

View raw message