accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "John R. Frank" <...@diffeo.com>
Subject Re: Optimal # proxy servers
Date Mon, 11 Aug 2014 04:16:26 GMT

Josh,

Following up on this earlier post about the proxy:

http://www.mail-archive.com/user%40accumulo.apache.org/msg03445.html



On 4/14/14, 1:38 PM, Josh Elser wrote:

> If you care about maximizing your throughput, ingest is probably not 
> desirable through the proxy (you can probably get ~10x faster using the 
> Java BatchWriter API).

> Hrm. 10x may have been overstating too. 5x is probably more accurate. 
> YMMV :)



Is there something more than the extra network hop that makes the proxy 
slow?  The proxy exposes a BatchWriter interface:

https://github.com/accumulo/pyaccumulo/blob/master/README.md#writing-mutations-with-a-batchwriter-batched-and-optimized-for-throughput

So, we can batch up multiple requests through the proxy.  Is there 
something else that is only available (only possible?) by going direct 
instead of through the proxy?

For example, is there a logical difference between what can be done with 
the Java BatchWriter API and this kind of batching loop running through 
the thrift proxy:

https://github.com/diffeo/kvlayer/blob/master/kvlayer/_accumulo.py#L149

(Note the crude handling of the max thrift message size.)

If there is a logical difference, perhaps it would be worthwhile to 
translate the Java BatchWriter into C so there can be native support for 
C/C++/Python applications doing high-speed bulk ingest?


Thanks for your thoughts on this.


Regards,
John

Mime
View raw message