accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Josh Elser (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ACCUMULO-4506) Some in-progress files for replication never replicate
Date Thu, 03 Nov 2016 21:23:58 GMT

    [ https://issues.apache.org/jira/browse/ACCUMULO-4506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15634338#comment-15634338
] 

Josh Elser commented on ACCUMULO-4506:
--------------------------------------

Cool. So we at least know that ZK and the Master's memory are consistent. That's good. It
seems like it might be related to the DistributedWorkQueue (which I didn't write, smile).

Check what's inside of {{/accumulo/43d2ac5e-0df0-4727-aba9-05bae9e908e7/replication/workqueue/locks}}.
There should be a corresponding node in there with the same name which corresponds to a TabletServer
performing the work. There should be an ephemeral owner of that lock node.

I'd have to dig more into the implementation of the DWQ. I'm a bit fuzzy on the inner workings.
[~kturner] or [~ecn] might remember more.

>  Some in-progress files for replication never replicate
> -------------------------------------------------------
>
>                 Key: ACCUMULO-4506
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-4506
>             Project: Accumulo
>          Issue Type: Bug
>          Components: replication
>    Affects Versions: 1.7.2
>            Reporter: Adam J Shook
>
> We're seeing an issue with replication where two files have been in-progress for a long
time and based on the logs are not going to be replicated.  The metadata from the {{accumulo.replication}}
table looks a little funky, with a very large {{begin}} value.
> *Logs*
> {noformat}
> 2016-11-02 19:52:50,900 [replication.DistributedWorkQueueWorkAssigner] DEBUG: Not queueing
work for hdfs://host:9000/accumulo/wal/host+31032/9f038f64-4252-44a0-bfd0-99d4a316b397 to
Remote Name: peer_instance Remote identifier: 5h Source Table ID: k because [begin: 9223372036854775807
end: 0 infiniteEnd: true closed: true createdTime: 1477314365827] doesn't need replication
> 2016-11-02 19:53:08,900 [replication.DistributedWorkQueueWorkAssigner] DEBUG: Not queueing
work for hdfs://host:9000/accumulo/wal/host+31368/ae4b03ec-159b-44e8-9a88-ccf7fa849c19 to
Remote Name: peer_instance Remote identifier: 5i Source Table ID: l because [begin: 9223372036854775807
end: 0 infiniteEnd: true closed: true createdTime: 1477052816174] doesn't need replication
> {noformat}
> *Replication table*
> {noformat}
> scan -r hdfs://host:9000/accumulo/wal/host+31032/9f038f64-4252-44a0-bfd0-99d4a316b397
-t accumulo.replication
> hdfs://host:9000/accumulo/wal/host+31032/9f038f64-4252-44a0-bfd0-99d4a316b397 repl:j
[]    [begin: 0 end: 0 infiniteEnd: true closed: true createdTime: 1477314369633]
> hdfs://host:9000/accumulo/wal/host+31032/9f038f64-4252-44a0-bfd0-99d4a316b397 repl:k
[]    [begin: 9223372036854775807 end: 0 infiniteEnd: true closed: true createdTime: 1477314365827]
> hdfs://host:9000/accumulo/wal/host+31032/9f038f64-4252-44a0-bfd0-99d4a316b397 repl:l
[]    [begin: 9223372036854775807 end: 0 infiniteEnd: true closed: true createdTime: 1477314365707]
> hdfs://host:9000/accumulo/wal/host+31032/9f038f64-4252-44a0-bfd0-99d4a316b397 work:\x01\x00\x00\x00\x17peer_instance\x01\x00\x00\x00\x025g\x01\x00\x00\x00\x01j
[]    [begin: 0 end: 0 infiniteEnd: true closed: true createdTime: 1477314369633]
> hdfs://host:9000/accumulo/wal/host+31032/9f038f64-4252-44a0-bfd0-99d4a316b397 work:\x01\x00\x00\x00\x17peer_instance\x01\x00\x00\x00\x025h\x01\x00\x00\x00\x01k
[]    [begin: 9223372036854775807 end: 0 infiniteEnd: true closed: true createdTime: 1477314365827]
> hdfs://host:9000/accumulo/wal/host+31032/9f038f64-4252-44a0-bfd0-99d4a316b397 work:\x01\x00\x00\x00\x17peer_instance\x01\x00\x00\x00\x025i\x01\x00\x00\x00\x01l
[]    [begin: 9223372036854775807 end: 0 infiniteEnd: true closed: true createdTime: 1477314365707]
> scan -r hdfs://host:9000/accumulo/wal/host+31368/ae4b03ec-159b-44e8-9a88-ccf7fa849c19
-t accumulo.replication
> hdfs://host:9000/accumulo/wal/host+31368/ae4b03ec-159b-44e8-9a88-ccf7fa849c19 repl:j
[]    [begin: 9223372036854775807 end: 0 infiniteEnd: true closed: true createdTime: 1477052819752]
> hdfs://host:9000/accumulo/wal/host+31368/ae4b03ec-159b-44e8-9a88-ccf7fa849c19 repl:k
[]    [begin: 0 end: 0 infiniteEnd: true closed: true createdTime: 1477052816238]
> hdfs://host:9000/accumulo/wal/host+31368/ae4b03ec-159b-44e8-9a88-ccf7fa849c19 repl:l
[]    [begin: 9223372036854775807 end: 0 infiniteEnd: true closed: true createdTime: 1477052816174]
> hdfs://host:9000/accumulo/wal/host+31368/ae4b03ec-159b-44e8-9a88-ccf7fa849c19 work:\x01\x00\x00\x00\x17peer_instance\x01\x00\x00\x00\x025g\x01\x00\x00\x00\x01j
[]    [begin: 9223372036854775807 end: 0 infiniteEnd: true closed: true createdTime: 1477052819752]
> hdfs://host:9000/accumulo/wal/host+31368/ae4b03ec-159b-44e8-9a88-ccf7fa849c19 work:\x01\x00\x00\x00\x17peer_instance\x01\x00\x00\x00\x025h\x01\x00\x00\x00\x01k
[]    [begin: 0 end: 0 infiniteEnd: true closed: true createdTime: 1477052816238]
> hdfs://host:9000/accumulo/wal/host+31368/ae4b03ec-159b-44e8-9a88-ccf7fa849c19 work:\x01\x00\x00\x00\x17peer_instance\x01\x00\x00\x00\x025i\x01\x00\x00\x00\x01l
[]    [begin: 9223372036854775807 end: 0 infiniteEnd: true closed: true createdTime: 1477052816174]
> {noformat}
> *HDFS*
> {noformat}
> hdfs dfs -ls hdfs://host:9000/accumulo/wal/host+31032/9f038f64-4252-44a0-bfd0-99d4a316b397
hdfs://host:9000/accumulo/wal/host+31368/ae4b03ec-159b-44e8-9a88-ccf7fa849c19
> -rwxr-xr-x   3 ubuntu supergroup 1117650900 2016-10-24 13:09 hdfs://host:9000/accumulo/wal/host+31032/9f038f64-4252-44a0-bfd0-99d4a316b397
> -rwxr-xr-x   3 ubuntu supergroup 1171968390 2016-10-21 12:31 hdfs://host:9000/accumulo/wal/host+31368/ae4b03ec-159b-44e8-9a88-ccf7fa849c19
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message