accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Josh Elser (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ACCUMULO-4028) ServerClient getConnection is inefficient
Date Wed, 14 Oct 2015 16:35:05 GMT

    [ https://issues.apache.org/jira/browse/ACCUMULO-4028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14957229#comment-14957229
] 

Josh Elser commented on ACCUMULO-4028:
--------------------------------------

bq. And this is done for every file to be bulk loaded.

bq. The master has a list of the active tablet servers. It can pick one at random and create
a new connection to it, using, potentially thousands of fewer calls to the zoocache for each
file to be loaded.

Would it make sense for the BulkImporter to batch calls to the master as well to get many
random tservers at one time, instead of as it processes each file? Batching increases the
likelihood that a server dies before we get to use it, but that should be rare on average
and already be retried automatically (I hope).

> ServerClient getConnection is inefficient
> -----------------------------------------
>
>                 Key: ACCUMULO-4028
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-4028
>             Project: Accumulo
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 1.4.5, 1.5.4, 1.6.4, 1.7.0
>         Environment: Large production environment.
>            Reporter: Eric Newton
>            Assignee: Eric Newton
>             Fix For: 1.6.5, 1.7.1, 1.8.0
>
>
> Several bulk load FATE operations were taking a long time, but actual bulk load statistics
were quite good.
> The master bulk load threads were stuck in LoadFiles, specifically trying to get a connection
to a random tablet server.
> The method to get a random connection looks at all the tablet server locks in zookeeper.
On a large cluster (say, one with more than 1000 nodes), this is a lot of lookups in zookeeper.
 And this is done for every file to be bulk loaded.
> Normally, these lookups would be cached in zooCache, and the next look up would would
all be from local memory.  But the cache is a singleton in the master, so other activities,
especially those that make RPC calls to zookeeper while holding the lock, will delay these
lookups.
> The master has a list of the active tablet servers. It can pick one at random and create
a new connection to it, using, potentially thousands of fewer calls to the zoocache for each
file to be loaded.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message