accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Josh Elser (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ACCUMULO-4506) Some in-progress files for replication never replicate
Date Wed, 02 Nov 2016 20:29:58 GMT

    [ https://issues.apache.org/jira/browse/ACCUMULO-4506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15630364#comment-15630364
] 

Josh Elser commented on ACCUMULO-4506:
--------------------------------------

You'll want to figure out why each of those two files have replication work to do but is not
happening:

{noformat}
hdfs://host:9000/accumulo/wal/host+31368/ae4b03ec-159b-44e8-9a88-ccf7fa849c19 work:\x01\x00\x00\x00\x17peer_instance\x01\x00\x00\x00\x025h\x01\x00\x00\x00\x01k
[]    [begin: 0 end: 0 infiniteEnd: true closed: true createdTime: 1477052816238]
hdfs://host:9000/accumulo/wal/host+31032/9f038f64-4252-44a0-bfd0-99d4a316b397 work:\x01\x00\x00\x00\x17peer_instance\x01\x00\x00\x00\x025g\x01\x00\x00\x00\x01j
[]    [begin: 0 end: 0 infiniteEnd: true closed: true createdTime: 1477314369633]
{noformat}

The files cannot be removed because replication to this peer still needs to happen (and thus
would be dataloss on that peer if it doesn't happen). The "large" begin value is essentially
signifying that replication on that file is done for the peer.

Look for logs in the Master from DistributedWorkQueueWorkAssigner or UnorderedWorkAssigner
for this file to that peer. You should see some reason as to why the Master isn't assigning
this work (or some information as to the TabletServer that is supposed to be performing replication).
After that last work entry is like the others for that file, the Master should clean up all
of these records which will let the Accumulo GC remove the file.

>  Some in-progress files for replication never replicate
> -------------------------------------------------------
>
>                 Key: ACCUMULO-4506
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-4506
>             Project: Accumulo
>          Issue Type: Bug
>          Components: replication
>    Affects Versions: 1.7.2
>            Reporter: Adam J Shook
>            Assignee: Josh Elser
>
> We're seeing an issue with replication where two files have been in-progress for a long
time and based on the logs are not going to be replicated.  The metadata from the {{accumulo.replication}}
table looks a little funky, with a very large {{begin}} value.
> *Logs*
> {noformat}
> 2016-11-02 19:52:50,900 [replication.DistributedWorkQueueWorkAssigner] DEBUG: Not queueing
work for hdfs://host:9000/accumulo/wal/host+31032/9f038f64-4252-44a0-bfd0-99d4a316b397 to
Remote Name: peer_instance Remote identifier: 5h Source Table ID: k because [begin: 9223372036854775807
end: 0 infiniteEnd: true closed: true createdTime: 1477314365827] doesn't need replication
> 2016-11-02 19:53:08,900 [replication.DistributedWorkQueueWorkAssigner] DEBUG: Not queueing
work for hdfs://host:9000/accumulo/wal/host+31368/ae4b03ec-159b-44e8-9a88-ccf7fa849c19 to
Remote Name: peer_instance Remote identifier: 5i Source Table ID: l because [begin: 9223372036854775807
end: 0 infiniteEnd: true closed: true createdTime: 1477052816174] doesn't need replication
> {noformat}
> *Replication table*
> {noformat}
> scan -r hdfs://host:9000/accumulo/wal/host+31032/9f038f64-4252-44a0-bfd0-99d4a316b397
-t accumulo.replication
> hdfs://host:9000/accumulo/wal/host+31032/9f038f64-4252-44a0-bfd0-99d4a316b397 repl:j
[]    [begin: 0 end: 0 infiniteEnd: true closed: true createdTime: 1477314369633]
> hdfs://host:9000/accumulo/wal/host+31032/9f038f64-4252-44a0-bfd0-99d4a316b397 repl:k
[]    [begin: 9223372036854775807 end: 0 infiniteEnd: true closed: true createdTime: 1477314365827]
> hdfs://host:9000/accumulo/wal/host+31032/9f038f64-4252-44a0-bfd0-99d4a316b397 repl:l
[]    [begin: 9223372036854775807 end: 0 infiniteEnd: true closed: true createdTime: 1477314365707]
> hdfs://host:9000/accumulo/wal/host+31032/9f038f64-4252-44a0-bfd0-99d4a316b397 work:\x01\x00\x00\x00\x17peer_instance\x01\x00\x00\x00\x025g\x01\x00\x00\x00\x01j
[]    [begin: 0 end: 0 infiniteEnd: true closed: true createdTime: 1477314369633]
> hdfs://host:9000/accumulo/wal/host+31032/9f038f64-4252-44a0-bfd0-99d4a316b397 work:\x01\x00\x00\x00\x17peer_instance\x01\x00\x00\x00\x025h\x01\x00\x00\x00\x01k
[]    [begin: 9223372036854775807 end: 0 infiniteEnd: true closed: true createdTime: 1477314365827]
> hdfs://host:9000/accumulo/wal/host+31032/9f038f64-4252-44a0-bfd0-99d4a316b397 work:\x01\x00\x00\x00\x17peer_instance\x01\x00\x00\x00\x025i\x01\x00\x00\x00\x01l
[]    [begin: 9223372036854775807 end: 0 infiniteEnd: true closed: true createdTime: 1477314365707]
> scan -r hdfs://host:9000/accumulo/wal/host+31368/ae4b03ec-159b-44e8-9a88-ccf7fa849c19
-t accumulo.replication
> hdfs://host:9000/accumulo/wal/host+31368/ae4b03ec-159b-44e8-9a88-ccf7fa849c19 repl:j
[]    [begin: 9223372036854775807 end: 0 infiniteEnd: true closed: true createdTime: 1477052819752]
> hdfs://host:9000/accumulo/wal/host+31368/ae4b03ec-159b-44e8-9a88-ccf7fa849c19 repl:k
[]    [begin: 0 end: 0 infiniteEnd: true closed: true createdTime: 1477052816238]
> hdfs://host:9000/accumulo/wal/host+31368/ae4b03ec-159b-44e8-9a88-ccf7fa849c19 repl:l
[]    [begin: 9223372036854775807 end: 0 infiniteEnd: true closed: true createdTime: 1477052816174]
> hdfs://host:9000/accumulo/wal/host+31368/ae4b03ec-159b-44e8-9a88-ccf7fa849c19 work:\x01\x00\x00\x00\x17peer_instance\x01\x00\x00\x00\x025g\x01\x00\x00\x00\x01j
[]    [begin: 9223372036854775807 end: 0 infiniteEnd: true closed: true createdTime: 1477052819752]
> hdfs://host:9000/accumulo/wal/host+31368/ae4b03ec-159b-44e8-9a88-ccf7fa849c19 work:\x01\x00\x00\x00\x17peer_instance\x01\x00\x00\x00\x025h\x01\x00\x00\x00\x01k
[]    [begin: 0 end: 0 infiniteEnd: true closed: true createdTime: 1477052816238]
> hdfs://host:9000/accumulo/wal/host+31368/ae4b03ec-159b-44e8-9a88-ccf7fa849c19 work:\x01\x00\x00\x00\x17peer_instance\x01\x00\x00\x00\x025i\x01\x00\x00\x00\x01l
[]    [begin: 9223372036854775807 end: 0 infiniteEnd: true closed: true createdTime: 1477052816174]
> {noformat}
> *HDFS*
> {noformat}
> hdfs dfs -ls hdfs://host:9000/accumulo/wal/host+31032/9f038f64-4252-44a0-bfd0-99d4a316b397
hdfs://host:9000/accumulo/wal/host+31368/ae4b03ec-159b-44e8-9a88-ccf7fa849c19
> -rwxr-xr-x   3 ubuntu supergroup 1117650900 2016-10-24 13:09 hdfs://host:9000/accumulo/wal/host+31032/9f038f64-4252-44a0-bfd0-99d4a316b397
> -rwxr-xr-x   3 ubuntu supergroup 1171968390 2016-10-21 12:31 hdfs://host:9000/accumulo/wal/host+31368/ae4b03ec-159b-44e8-9a88-ccf7fa849c19
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message