accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <>
Subject Re: Optimal # proxy servers
Date Mon, 14 Apr 2014 17:43:32 GMT
Hrm. 10x may have been overstating too. 5x is probably more accurate. 

On 4/14/14, 1:38 PM, Josh Elser wrote:
> If you can about maximizing your throughput, ingest is probably not
> desirable through the proxy (you can probably get ~10x faster using the
> Java BatchWriter API).
> I wouldn't avoid the proxy server purely because of using batch_scans
> though. If you look at the Java impl of the BatchScanner, it essentially
> keeps a queue which many servers are concurrently throwing results onto
> and providing a Java Iterator to that queue to the client. With this in
> mind, this is very similar to what the proxy server is doing for you.
> On 4/14/14, 12:12 PM, David O'Gwynn wrote:
>> Ah, thanks Eric, that answers my question. It sounds like using the
>> proxy server for batch_scans and ingest is a bit beyond its scope. Are
>> there plans for beefing up the proxy to handle a wider range of
>> purposes from multiple clients?
>> Thanks,
>> David
>> On Mon, Apr 14, 2014 at 11:06 AM, Eric Newton <>
>> wrote:
>>> High ingest and batch scans use resources within the proxy for queuing
>>> data.  If I was using a proxy for these activities, I would want to
>>> have a proxy for each client.  Administrative requests, and even basic
>>> single-range scans are simple pass-throughs with a much lower chance
>>> of overloading the proxy.
>>> On Mon, Apr 14, 2014 at 9:56 AM, David Medinets
>>> <> wrote:
>>>> "number of proxy servers should be proportional to the number of
>>>> clients" -
>>>> I hate to be pedantic but
>>>> this is a very general statement. Can you be more specific? Should the
>>>> proportion be 1:1 or 5:1? What factors affect the ratio?
>>>> On Mon, Apr 14, 2014 at 9:32 AM, Eric Newton <>
>>>> wrote:
>>>>> The number of proxy servers should be proportional to the number of
>>>>> clients.
>>>>> The proxy can talk to all the tablet servers, but the client of the
>>>>> proxy only has the proxy to make requests on its behalf.
>>>>> As always, it's going to depend on what you want to do, what your
>>>>> schema looks like, and the total number of servers you have.
>>>>> -Eric
>>>>> On Sun, Apr 13, 2014 at 11:58 PM, David O'Gwynn <>
>>>>> wrote:
>>>>>> Hi community,
>>>>>> I was reading a thread "Error stressing with pyaccumulo app" from
>>>>>> February, and the topic of optimal number of proxy servers for a
>>>>>> cluster of a given size came up. Does anyone have any insight into
>>>>>> that question? Is there a thread in the archive that addresses this
>>>>>> question directly?
>>>>>> My gut tells me that you should have a number proportional to the
>>>>>> number of tablet servers, but I'm afraid I don't really understand
>>>>>> what the proxy server is doing.

View raw message