accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <>
Subject Re: Query Services Layer Question
Date Tue, 20 May 2014 02:45:37 GMT
Hi Jeff,

Not a rookie question at all. This is an area in the API where we know 
we could make the lifecycle more obvious. We have a ticket somewhere for it.

If you're using a single user/password to connect to Accumulo (not using 
special accounts per your QSL client), there's no reason you can't reuse 
Connectors. The number of Connectors you want to cache is likely 
relative to the concurrent user load of your service.

The fun part here is that each Connector retains a reference to the 
Instance which it uses internally. There are synchronized calls inside 
each ZooKeeperInstance which may start to degrade when you get above 
maybe 50 concurrent threads accessing it (ballpark guess).

You also do not want to create a new ZooKeeperInstance for every request 
as you're doing now as I believe it will cause you some issues in Java 
heap due to some nitty-gritty ZooKeeper details (ask if you're actually 

In summary, definitely cache ZooKeeperInstances, but use some number 
relative to the number of users. Connectors can be cached too, but share 
Instances under the hoods. Using HTTP benchmarking tools with various 
client pool sizes like JMeter should help you balance out these numbers.

Hope this helps.

- Josh

On 5/19/14, 10:29 PM, Jeff Schwartz wrote:
> Rookie Question...  I've built a Query Service Layer (QSL) according to
> the documentation from the Accumulo v1.6.0 User Manual.  My question is
> how often should I be getting a Zoo Keeper Instance and Connector to
> accumulo.  For example, here's some psuedo code for a typical service in
> my QSL.
> public void readTable(...) {
>      Instance instance = new ZooKeeperInstance(accumuloInstanceName,
> zooServers);
>      Connector connector = instance.getConnector(username, passwordToken);
>      Scanner scanner = connector.getScanner(tableName, auths);
>      Scanner.setRange(range);
>      for (Map.Entry<Key,Value> entry : scanner) {
>        ...
>      }
>      scanner.close();
> }
> If I do these lines of code for every call in my restful service, then I
> feel like that is generating a lot of extra connections to both
> zookeeper and accumulo.  Additionally, I would assume that that will
> have a negative impact on performance.  Should I cache any connectors or
> ZooKeeper instances?
> Any suggestions or best practices would be greatly appreciated.
> Thanks in advance.
> Sincerely,
> Jeff Schwartz

View raw message