accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Josh Elser (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ACCUMULO-3289) BulkFileIT failed to import files
Date Wed, 12 Nov 2014 18:17:34 GMT

    [ https://issues.apache.org/jira/browse/ACCUMULO-3289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14208381#comment-14208381
] 

Josh Elser commented on ACCUMULO-3289:
--------------------------------------

Saw this fail again last night running on SUSE, both normally and with SSL RPC enabled. In
both of them, the master has errors where a batchscanner is timing out repeatedly (every 2mins).
The tabletserver the master repeatedly times out to has the following message as the last
bulkimport related log msg:

{noformat}
2014-11-12 09:00:45,225 [client.BulkImporter] DEBUG: Estimated map files sizes in   0.00 secs
2014-11-12 09:00:45,254 [client.BulkImporter] DEBUG: Assigning 1 map files to 1 tablets at
other_tserver:port
{noformat}

So, the master is trying to read the import status (i believe) with the batchscanner and repeatedly
times out trying to talk to the server which is currently coordinating the bulk import with
the 2nd tserver. The tserver coordinating the import sits indefinitely, never appearing to
make any progress but the normal logging is occurs.

> BulkFileIT failed to import files
> ---------------------------------
>
>                 Key: ACCUMULO-3289
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-3289
>             Project: Accumulo
>          Issue Type: Bug
>          Components: test
>            Reporter: Josh Elser
>             Fix For: 1.6.2, 1.7.0
>
>         Attachments: accumulo-3289.tar.gz
>
>
> Had a failure with BulkFileIT. Looking at the master logs, it appears like the following
might have happened:
> * 2 Tservers
> * One of the tservers doesn't respond to communication
> * The master repeatedly contacts it to try to tell it to perform the bulk load
> * The tserver that isn't communicating w/ the master has no errors
> * That tserver logged an assignment that never finished
> * That tserver also got a single bulk import request and the last thing it logged WRT
that bulk import was that "Assigning 1 map files to 3 tablets at ...". The 2nd tserver doesn't
appear to have anything from that tserver about the import request which should have been
incoming.
> * Eventually the master tried to stop that other tserver, but the test timed out in ~30s
later (not sure if the tserver would've actually stopped).
> The fact that I see an incomplete assignment and inexplicable bulk load hangs gives me
pause in light of ACCUMULO-3276. Will attach some logs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message