accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Josh Elser (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ACCUMULO-3268) HoldTimeoutException is poorly propagated to clients
Date Mon, 27 Oct 2014 23:13:34 GMT

    [ https://issues.apache.org/jira/browse/ACCUMULO-3268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14185995#comment-14185995
] 

Josh Elser commented on ACCUMULO-3268:
--------------------------------------

A RetryPolicy would be nice to provide. At a minimum, some mechanism for the client to know
that {{these_mutations}} failed to be applied in some scenario that *could* potentially be
retried. Providing a policy that will automatically retry them would be even nicer, but giving
clients the tools to do it themselves is a very big first step.

> HoldTimeoutException is poorly propagated to clients
> ----------------------------------------------------
>
>                 Key: ACCUMULO-3268
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-3268
>             Project: Accumulo
>          Issue Type: Improvement
>          Components: client
>    Affects Versions: 1.6.1
>            Reporter: Josh Elser
>            Priority: Critical
>             Fix For: 1.6.2, 1.7.0
>
>
> 6 node cluster was running randomwalk when the MultiTable module failed. A BatchWriter
was trying to add a new Mutation to a table in {{o.a.a.test.randomwalk.multitable.Write}}.
The call to {{addMutations}} failed with a MutationsRejectedException with the information
that there was an exception on the server.
> In actuality, the addition of this mutation triggered a flush and tried to ship it over
to a tabletserver. The tabletserver hosting the tablet for that mutation was under load but
still responsive. The hold time was exceeded for this tserver, but all the client sees is
that there was *some* exception on this server.
> If the client actually *knew* that commits were being held, it could correctly back off
(sleep) and retry the mutations since the last flush. Right now, they can't really do anything.
Additionally, being unable to get the mutations that were buffered since the last flush is
sub-par, but that can be worked around.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message