accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adam Fuchs <>
Subject Re: C++ accumulo client --> native clients for Python, Go, Ruby etc
Date Wed, 08 Oct 2014 18:04:50 GMT

Do you have any performance numbers that you can share around your use
of the existing proxy solution? One of the reasons that Thrift is
performant for Accumulo is that messages are batched by the client
library and sent over a smaller number of RPCs. A C++ client will also
need to have mechanisms like the BatchWriter to get the best
performance. Also, it may be possible to make the proxy faster by
batching more data into the update call.


On Mon, Oct 6, 2014 at 5:20 PM, John R. Frank <> wrote:
> Two kinds of gains:
> 1) single client throughput:  the extra RPC hop through the proxy deserializes and then
reserializes the messages.  With the proxy running locally the extra network hop is less of
an issue.  This was discussed on the user list (see link earlier in this thread), and 5x slow
down was suggested as a possible swag estimate.
> 2) cluster management complexity: it's clearly best to have the proxy local to the workers,
but if you have a worker on every core of a large box (eg 32), then having a single proxy
on each worker machine becomes a bottleneck. Running many proxies on a single JVM is the next
thing we could try to improve this --- having a native client seems preferable.
> Comments?
> jrf
>> On Oct 6, 2014, at 4:15 PM, David Medinets <> wrote:
>> How far away from the theoretical maximum rate is the thrift protocol?
>> What kind of gain is expected from the native C++ approach?
>>> On Sat, Oct 4, 2014 at 12:56 PM, John R. Frank <> wrote:
>>> Accumulo Developers,
>>> We're trying to boost throughput of non-Java tools with Accumulo.  It seems that
the lowest hanging fruit is to stop using the thrift proxy. Per discussion about Python and
thrift proxy in the users list [1], I'm wondering if anyone is interested in helping with
a native C++ client?  There is a start on one here [2]. We could offer a bounty or maybe make
a consulting project depending who is interested in it.
>>> We also looked at trying to run a separate thrift proxy for every worker thread
or process.  With many cores on a box, eg 32, it just doesn't seem practical to run that many
proxies, even if they all run on a single JVM. We'd be glad to hear ideas on that front too.
>>> A potentially big benefit of making a proper C++ accumulo client is that it is
straightforward to expose native interfaces in Python (via pyObject), Go [3], Ruby [4], and
other languages.
>>> Thanks for any advice, pointers, interest.
>>> John
>>> 1--
>>> 2--
>>> 3--
>>> 4--
>>> Sent from +1-617-899-2066

View raw message