accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eric Newton (JIRA)" <>
Subject [jira] [Commented] (ACCUMULO-4028) ServerClient getConnection is inefficient
Date Wed, 14 Oct 2015 16:14:05 GMT


Eric Newton commented on ACCUMULO-4028:

May want to use Read/Write locks to eliminate some of the contention in ZooCache.

> ServerClient getConnection is inefficient
> -----------------------------------------
>                 Key: ACCUMULO-4028
>                 URL:
>             Project: Accumulo
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 1.4.5, 1.5.4, 1.6.4, 1.7.0
>         Environment: Large production environment.
>            Reporter: Eric Newton
>            Assignee: Eric Newton
>             Fix For: 1.6.5, 1.7.1, 1.8.0
> Several bulk load FATE operations were taking a long time, but actual bulk load statistics
were quite good.
> The master bulk load threads were stuck in LoadFiles, specifically trying to get a connection
to a random tablet server.
> The method to get a random connection looks at all the tablet server locks in zookeeper.
On a large cluster (say, one with more than 1000 nodes), this is a lot of lookups in zookeeper.
 And this is done for every file to be bulk loaded.
> Normally, these lookups would be cached in zooCache, and the next look up would would
all be from local memory.  But the cache is a singleton in the master, so other activities,
especially those that make RPC calls to zookeeper while holding the lock, will delay these
> The master has a list of the active tablet servers. It can pick one at random and create
a new connection to it, using, potentially thousands of fewer calls to the zoocache for each
file to be loaded.

This message was sent by Atlassian JIRA

View raw message