accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "John R. Frank" <>
Subject Re: Optimal # proxy servers
Date Mon, 11 Aug 2014 04:16:26 GMT


Following up on this earlier post about the proxy:

On 4/14/14, 1:38 PM, Josh Elser wrote:

> If you care about maximizing your throughput, ingest is probably not 
> desirable through the proxy (you can probably get ~10x faster using the 
> Java BatchWriter API).

> Hrm. 10x may have been overstating too. 5x is probably more accurate. 
> YMMV :)

Is there something more than the extra network hop that makes the proxy 
slow?  The proxy exposes a BatchWriter interface:

So, we can batch up multiple requests through the proxy.  Is there 
something else that is only available (only possible?) by going direct 
instead of through the proxy?

For example, is there a logical difference between what can be done with 
the Java BatchWriter API and this kind of batching loop running through 
the thrift proxy:

(Note the crude handling of the max thrift message size.)

If there is a logical difference, perhaps it would be worthwhile to 
translate the Java BatchWriter into C so there can be native support for 
C/C++/Python applications doing high-speed bulk ingest?

Thanks for your thoughts on this.


View raw message