accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Josh Elser (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (ACCUMULO-3276) Shard.xml hung with no client output
Date Thu, 30 Oct 2014 19:08:33 GMT

    [ https://issues.apache.org/jira/browse/ACCUMULO-3276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14190630#comment-14190630
] 

Josh Elser edited comment on ACCUMULO-3276 at 10/30/14 7:08 PM:
----------------------------------------------------------------

Some more information now. Let's assume that there was an assignment which got hung. Some
more backstory on the environment is that Bulk, Conditional, Image, MultiTable, Security and
Sequential all ran before Shard did. Let's see if we can figure out when this tabletserver
stopped bringing assignments online:

{panel:title=Assignment requests}
{noformat}
2014-10-28 18:14:04,351 [tserver.TabletServer] INFO : Loading tablet 2<<
2014-10-28 18:37:28,475 [tserver.TabletServer] INFO : Loading tablet 3;r03f53;r0155f
2014-10-28 18:37:28,742 [tserver.TabletServer] INFO : Loading tablet 3;r04d6f;r03f53
2014-10-28 18:37:29,280 [tserver.TabletServer] INFO : Loading tablet 3;r07dad;r06c0f
2014-10-28 18:37:29,712 [tserver.TabletServer] INFO : Loading tablet 3;r08ff0;r07dad
2014-10-28 18:37:30,192 [tserver.TabletServer] INFO : Loading tablet 3;r0bece;r0bbef
2014-10-28 20:13:58,094 [tserver.TabletServer] INFO : Loading tablet 7<<
2014-10-28 21:03:52,461 [tserver.TabletServer] INFO : Loading tablet c<<
{noformat}
{panel}

{panel:title=Assignments that were run (not known if completed)}
{noformat}
2014-10-28 18:14:04,358 [tserver.TabletServer] DEBUG: Loading extent: 2<<
2014-10-28 18:37:28,475 [tserver.TabletServer] DEBUG: Loading extent: 3;r03f53;r0155f
{noformat}
{panel}

{panel:title=Assignment acknowledged by the master for our tserver}
{noformat}
2014-10-28 18:14:05,145 [master.EventCoordinator] INFO : tablet 2<< was loaded on tserver:9997
{noformat}
{panel}

I believe this means that we can extrapolate that {{3;r03f53;r0155f}} was trying to be assigned
by this tserver, never actually finished its assignment and hung future assignments on this
server. Table ID of '3' was the table used by Bulk.xml.


was (Author: elserj):
Some more information now. Let's assume that there was an assignment which got hung. Some
more backstory on the environment is that Bulk, Conditional, Image, MultiTable, Security and
Sequential all ran before Shard did. Let's see if we can figure out when this tabletserver
stopped bringing assignments online:

{panel:title=Assignment requests}
{noformat}
2014-10-28 18:14:04,351 [tserver.TabletServer] INFO : Loading tablet 2<<
2014-10-28 18:37:28,475 [tserver.TabletServer] INFO : Loading tablet 3;r03f53;r0155f
2014-10-28 18:37:28,742 [tserver.TabletServer] INFO : Loading tablet 3;r04d6f;r03f53
2014-10-28 18:37:29,280 [tserver.TabletServer] INFO : Loading tablet 3;r07dad;r06c0f
2014-10-28 18:37:29,712 [tserver.TabletServer] INFO : Loading tablet 3;r08ff0;r07dad
2014-10-28 18:37:30,192 [tserver.TabletServer] INFO : Loading tablet 3;r0bece;r0bbef
2014-10-28 20:13:58,094 [tserver.TabletServer] INFO : Loading tablet 7<<
2014-10-28 21:03:52,461 [tserver.TabletServer] INFO : Loading tablet c<<
{noformat}
{panel}

{panel:title=Assignments that were run (not known if completed)}
{noformat}
2014-10-28 18:14:04,358 [tserver.TabletServer] DEBUG: Loading extent: 2<<
2014-10-28 18:37:28,475 [tserver.TabletServer] DEBUG: Loading extent: 3;r03f53;r0155f
{noformat}
{panel}

{panel:title=Assignment acknowledged by the master for our tserver}
{noformat}
2014-10-28 18:14:05,145 [master.EventCoordinator] INFO : tablet 2<< was loaded on tserver:9997
{noformat}
{panel}

I believe this means that we can extrapolate that {{3;r03f53;r0155f}} was trying to be assigned
by this tserver, never actually finished its assignment and hung future assignments on this
server.

> Shard.xml hung with no client output
> ------------------------------------
>
>                 Key: ACCUMULO-3276
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-3276
>             Project: Accumulo
>          Issue Type: Bug
>          Components: test
>    Affects Versions: 1.6.1
>            Reporter: Josh Elser
>            Assignee: Josh Elser
>             Fix For: 1.6.2, 1.7.0
>
>          Time Spent: 40m
>  Remaining Estimate: 0h
>
> Ran Shard.xml over a 5 node instance. The only line of client output I got was that ZooSession
connected to the quorum.
> 45 minutes later, my test runner timed out the module. We need more information in the
client test log to actually determine where it got stuck.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message