accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From William Slacum <wilhelm.von.cl...@accumulo.net>
Subject Re: Optimal # proxy servers
Date Mon, 11 Aug 2014 12:32:58 GMT
Going through the proxy will always be an extra RPC step over using a Java
client. Eliminating that step, I think, would net the most benefit.


On Mon, Aug 11, 2014 at 12:16 AM, John R. Frank <jrf@diffeo.com> wrote:

>
> Josh,
>
> Following up on this earlier post about the proxy:
>
> http://www.mail-archive.com/user%40accumulo.apache.org/msg03445.html
>
>
>
> On 4/14/14, 1:38 PM, Josh Elser wrote:
>
>  If you care about maximizing your throughput, ingest is probably not
>> desirable through the proxy (you can probably get ~10x faster using the
>> Java BatchWriter API).
>>
>
>  Hrm. 10x may have been overstating too. 5x is probably more accurate.
>> YMMV :)
>>
>
>
>
> Is there something more than the extra network hop that makes the proxy
> slow?  The proxy exposes a BatchWriter interface:
>
> https://github.com/accumulo/pyaccumulo/blob/master/README.
> md#writing-mutations-with-a-batchwriter-batched-and-
> optimized-for-throughput
>
> So, we can batch up multiple requests through the proxy.  Is there
> something else that is only available (only possible?) by going direct
> instead of through the proxy?
>
> For example, is there a logical difference between what can be done with
> the Java BatchWriter API and this kind of batching loop running through the
> thrift proxy:
>
> https://github.com/diffeo/kvlayer/blob/master/kvlayer/_accumulo.py#L149
>
> (Note the crude handling of the max thrift message size.)
>
> If there is a logical difference, perhaps it would be worthwhile to
> translate the Java BatchWriter into C so there can be native support for
> C/C++/Python applications doing high-speed bulk ingest?
>
>
> Thanks for your thoughts on this.
>
>
> Regards,
> John
>

Mime
View raw message