accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Josh Elser (JIRA)" <>
Subject [jira] [Created] (ACCUMULO-3268) HoldTimeoutException is poorly propagated to clients
Date Mon, 27 Oct 2014 21:42:34 GMT
Josh Elser created ACCUMULO-3268:

             Summary: HoldTimeoutException is poorly propagated to clients
                 Key: ACCUMULO-3268
             Project: Accumulo
          Issue Type: Improvement
          Components: client
    Affects Versions: 1.6.1
            Reporter: Josh Elser
            Priority: Critical
             Fix For: 1.6.2, 1.7.0

6 node cluster was running randomwalk when the MultiTable module failed. A BatchWriter was
trying to add a new Mutation to a table in {{o.a.a.test.randomwalk.multitable.Write}}. The
call to {{addMutations}} failed with a MutationsRejectedException with the information that
there was an exception on the server.

In actuality, the addition of this mutation triggered a flush and tried to ship it over to
a tabletserver. The tabletserver hosting the tablet for that mutation was under load but still
responsive. The hold time was exceeded for this tserver, but all the client sees is that there
was *some* exception on this server.

If the client actually *knew* that commits were being held, it could correctly back off (sleep)
and retry the mutations since the last flush. Right now, they can't really do anything. Additionally,
being unable to get the mutations that were buffered since the last flush is sub-par, but
that can be worked around.

This message was sent by Atlassian JIRA

View raw message