Mailing-List: contact notifications-help@accumulo.apache.org; run by ezmlm
Precedence: bulk
Reply-To: jira@apache.org
Date: Fri, 15 Mar 2013 18:54:13 +0000 (UTC)
From: "Eric Newton (JIRA)" <jira@apache.org>
To: notifications@accumulo.apache.org
Message-ID: <JIRA.12603258.1344871982515.448620.1363373653249@arcas>
In-Reply-To: <JIRA.12603258.1344871982515@arcas>
References: <JIRA.12603258.1344871982515@arcas>
Subject: [jira] [Resolved] (ACCUMULO-727) Bulk Import retry time needs to be
 longer/configurable
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


     [ https://issues.apache.org/jira/browse/ACCUMULO-727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Eric Newton resolved ACCUMULO-727.
----------------------------------

    Resolution: Fixed

Added exponential back-off, with a maximum wait between retries of 60 seconds.  Set the number of retries to 5, up from 3.

                
> Bulk Import retry time needs to be longer/configurable
> ------------------------------------------------------
>
>                 Key: ACCUMULO-727
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-727
>             Project: Accumulo
>          Issue Type: Bug
>          Components: tserver
>    Affects Versions: 1.4.1
>            Reporter: Brian Loss
>            Assignee: Eric Newton
>             Fix For: 1.5.0
>
>
> Bulk import retries way too fast (at least under some circumstances).  We had a tablet server that the master killed (we were overloading it with ingest and the hold time got too big so the master killed it).  At the same time, a bulk import operation had begun and several map files were assigned to the server that was just killed.  The bulk import retried three times in an 8 second span, each time failing with a connection refused error, and then gave up, failing the file completely.  Meanwhile, it took the master about 1m 20s to reassign the tablet to another server.
> The bulk import process should account for this possibility.  Either it needs to recognize that it can't connect to a tablet server so it must be down and the tablet will be reassigned somewhere else, or it should wait longer (such that the default max wait time is > the average tablet reassignment time).  In the latter case, the retry interval should be made into a configurable option at the same time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira