accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Josh Elser (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ACCUMULO-3289) BulkFileIT failed to import files
Date Wed, 12 Nov 2014 20:03:34 GMT

    [ https://issues.apache.org/jira/browse/ACCUMULO-3289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14208591#comment-14208591
] 

Josh Elser commented on ACCUMULO-3289:
--------------------------------------

So, in the one case, after the third attempt by the master to bulk import the files, one of
the tservers did import the one of the files (while the other 2 failed again). There's no
apparent reason as to why the tabletservers fail to respond (the master just reports that
the socket timed out). Possibly need to add some more diagnostics to the tabletservers to
rule out network oddities that kept those packets from even reaching the tabletserver from
the master.

> BulkFileIT failed to import files
> ---------------------------------
>
>                 Key: ACCUMULO-3289
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-3289
>             Project: Accumulo
>          Issue Type: Bug
>          Components: test
>            Reporter: Josh Elser
>            Assignee: Josh Elser
>             Fix For: 1.6.2, 1.7.0
>
>         Attachments: accumulo-3289.tar.gz
>
>
> Had a failure with BulkFileIT. Looking at the master logs, it appears like the following
might have happened:
> * 2 Tservers
> * One of the tservers doesn't respond to communication
> * The master repeatedly contacts it to try to tell it to perform the bulk load
> * The tserver that isn't communicating w/ the master has no errors
> * That tserver logged an assignment that never finished
> * That tserver also got a single bulk import request and the last thing it logged WRT
that bulk import was that "Assigning 1 map files to 3 tablets at ...". The 2nd tserver doesn't
appear to have anything from that tserver about the import request which should have been
incoming.
> * Eventually the master tried to stop that other tserver, but the test timed out in ~30s
later (not sure if the tserver would've actually stopped).
> The fact that I see an incomplete assignment and inexplicable bulk load hangs gives me
pause in light of ACCUMULO-3276. Will attach some logs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message