accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Keith Turner (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ACCUMULO-727) Bulk Import retry time needs to be longer/configurable
Date Fri, 15 Mar 2013 19:16:13 GMT

    [ https://issues.apache.org/jira/browse/ACCUMULO-727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13603715#comment-13603715
] 

Keith Turner commented on ACCUMULO-727:
---------------------------------------

A possible work around for 1.4 is to up tserver.bulk.retry.max.  1.4 does not have the exponential
back off that Eric just added, it just sleeps 4 secs between retries.   So this may need to
be set higher.  Like setting it to 120/4 to get at least 2 min of retries.
                
> Bulk Import retry time needs to be longer/configurable
> ------------------------------------------------------
>
>                 Key: ACCUMULO-727
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-727
>             Project: Accumulo
>          Issue Type: Bug
>          Components: tserver
>    Affects Versions: 1.4.1
>            Reporter: Brian Loss
>            Assignee: Eric Newton
>             Fix For: 1.5.0
>
>
> Bulk import retries way too fast (at least under some circumstances).  We had a tablet
server that the master killed (we were overloading it with ingest and the hold time got too
big so the master killed it).  At the same time, a bulk import operation had begun and several
map files were assigned to the server that was just killed.  The bulk import retried three
times in an 8 second span, each time failing with a connection refused error, and then gave
up, failing the file completely.  Meanwhile, it took the master about 1m 20s to reassign the
tablet to another server.
> The bulk import process should account for this possibility.  Either it needs to recognize
that it can't connect to a tablet server so it must be down and the tablet will be reassigned
somewhere else, or it should wait longer (such that the default max wait time is > the
average tablet reassignment time).  In the latter case, the retry interval should be made
into a configurable option at the same time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message