accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eric Newton (JIRA)" <>
Subject [jira] [Created] (ACCUMULO-4028) ServerClient getConnection is inefficient
Date Wed, 14 Oct 2015 16:09:05 GMT
Eric Newton created ACCUMULO-4028:

             Summary: ServerClient getConnection is inefficient
                 Key: ACCUMULO-4028
             Project: Accumulo
          Issue Type: Bug
          Components: client
    Affects Versions: 1.7.0, 1.6.4, 1.5.4, 1.4.5
         Environment: Large production environment.
            Reporter: Eric Newton
            Assignee: Eric Newton
             Fix For: 1.6.5, 1.7.1, 1.8.0

Several bulk load FATE operations were taking a long time, but actual bulk load statistics
were quite good.

The master bulk load threads were stuck in LoadFiles, specifically trying to get a connection
to a random tablet server.

The method to get a random connection looks at all the tablet server locks in zookeeper. On
a large cluster (say, one with more than 1000 nodes), this is a lot of lookups in zookeeper.
 And this is done for every file to be bulk loaded.

Normally, these lookups would be cached in zooCache, and the next look up would would all
be from local memory.  But the cache is a singleton in the master, so other activities, especially
those that make RPC calls to zookeeper while holding the lock, will delay these lookups.

The master has a list of the active tablet servers. It can pick one at random and create a
new connection to it, using, potentially thousands of fewer calls to the zoocache for each
file to be loaded.

This message was sent by Atlassian JIRA

View raw message