accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Adam J Shook (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ACCUMULO-4506) Some in-progress files for replication never replicate
Date Thu, 03 Nov 2016 20:51:58 GMT

    [ https://issues.apache.org/jira/browse/ACCUMULO-4506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15634235#comment-15634235
] 

Adam J Shook commented on ACCUMULO-4506:
----------------------------------------

I'm seeing the below error message from the UnorderedWorkAssigner:

{noformat}
2016-11-03 20:44:43,718 [replication.UnorderedWorkAssigner] DEBUG: hdfs://host:9000/accumulo/wal/host+31368/ae4b03ec-159b-44e8-9a88-ccf7fa849c19
is already queued to be replicated to Remote Name: peer_instance Remote identifier: 5h Source
Table ID: k, not re-queueing
{noformat}

Would it be possible that a lost table server is supposed to be doing this work and the system
is waiting for it to come back to life to be finished?  Any way to clear out the work queue?

Some context around it:

{noformat}
2016-11-03 20:44:43,715 [replication.DistributedWorkQueueWorkAssigner] INFO : Determining
if hdfs://host:9000/accumulo/wal/host+31368/ae4b03ec-159b-44e8-9a88-ccf7fa849c19 from l needs
to be replicated
2016-11-03 20:44:43,718 [replication.DistributedWorkQueueWorkAssigner] DEBUG: Not queueing
work for hdfs://host:9000/accumulo/wal/host+31368/ae4b03ec-159b-44e8-9a88-ccf7fa849c19 to
Remote Name: peer_instance Remote identifier: 5g Source Table ID: j because [begin: 9223372036854775807
end: 0 infiniteEnd: true closed: true createdTime: 1477052819752] doesn't need replication
2016-11-03 20:44:43,718 [replication.UnorderedWorkAssigner] DEBUG: hdfs://host:9000/accumulo/wal/host+31368/ae4b03ec-159b-44e8-9a88-ccf7fa849c19
is already queued to be replicated to Remote Name: peer_instance Remote identifier: 5h Source
Table ID: k, not re-queueing
2016-11-03 20:44:43,718 [replication.DistributedWorkQueueWorkAssigner] DEBUG: Not queueing
work for hdfs://host:9000/accumulo/wal/host+31368/ae4b03ec-159b-44e8-9a88-ccf7fa849c19 to
Remote Name: peer_instance Remote identifier: 5i Source Table ID: l because [begin: 9223372036854775807
end: 0 infiniteEnd: true closed: true createdTime: 1477052816174] doesn't need replication
2016-11-03 20:44:43,718 [replication.DistributedWorkQueueWorkAssigner] INFO : Assigned 0 replication
work entries for hdfs://host:9000/accumulo/wal/host+31368/ae4b03ec-159b-44e8-9a88-ccf7fa849c19
2016-11-03 20:44:43,718 [replication.DistributedWorkQueueWorkAssigner] INFO : Determining
if hdfs://host:9000/accumulo/wal/host+31368/ae4b03ec-159b-44e8-9a88-ccf7fa849c19 from k needs
to be replicated
2016-11-03 20:44:43,720 [replication.DistributedWorkQueueWorkAssigner] DEBUG: Not queueing
work for hdfs://host:9000/accumulo/wal/host+31368/ae4b03ec-159b-44e8-9a88-ccf7fa849c19 to
Remote Name: peer_instance Remote identifier: 5g Source Table ID: j because [begin: 9223372036854775807
end: 0 infiniteEnd: true closed: true createdTime: 1477052819752] doesn't need replication
2016-11-03 20:44:43,720 [replication.UnorderedWorkAssigner] DEBUG: hdfs://host:9000/accumulo/wal/host+31368/ae4b03ec-159b-44e8-9a88-ccf7fa849c19
is already queued to be replicated to Remote Name: peer_instance Remote identifier: 5h Source
Table ID: k, not re-queueing
2016-11-03 20:44:43,720 [replication.DistributedWorkQueueWorkAssigner] DEBUG: Not queueing
work for hdfs://host:9000/accumulo/wal/host+31368/ae4b03ec-159b-44e8-9a88-ccf7fa849c19 to
Remote Name: peer_instance Remote identifier: 5i Source Table ID: l because [begin: 9223372036854775807
end: 0 infiniteEnd: true closed: true createdTime: 1477052816174] doesn't need replication
2016-11-03 20:44:43,720 [replication.DistributedWorkQueueWorkAssigner] INFO : Assigned 0 replication
work entries for hdfs://host:9000/accumulo/wal/host+31368/ae4b03ec-159b-44e8-9a88-ccf7fa849c19
2016-11-03 20:44:43,720 [replication.DistributedWorkQueueWorkAssigner] INFO : Determining
if hdfs://host:9000/accumulo/wal/host+31368/ae4b03ec-159b-44e8-9a88-ccf7fa849c19 from j needs
to be replicated
2016-11-03 20:44:43,722 [replication.DistributedWorkQueueWorkAssigner] DEBUG: Not queueing
work for hdfs://host:9000/accumulo/wal/host+31368/ae4b03ec-159b-44e8-9a88-ccf7fa849c19 to
Remote Name: peer_instance Remote identifier: 5g Source Table ID: j because [begin: 9223372036854775807
end: 0 infiniteEnd: true closed: true createdTime: 1477052819752] doesn't need replication
2016-11-03 20:44:43,722 [replication.UnorderedWorkAssigner] DEBUG: hdfs://host:9000/accumulo/wal/host+31368/ae4b03ec-159b-44e8-9a88-ccf7fa849c19
is already queued to be replicated to Remote Name: peer_instance Remote identifier: 5h Source
Table ID: k, not re-queueing
2016-11-03 20:44:43,722 [replication.DistributedWorkQueueWorkAssigner] DEBUG: Not queueing
work for hdfs://host:9000/accumulo/wal/host+31368/ae4b03ec-159b-44e8-9a88-ccf7fa849c19 to
Remote Name: peer_instance Remote identifier: 5i Source Table ID: l because [begin: 9223372036854775807
end: 0 infiniteEnd: true closed: true createdTime: 1477052816174] doesn't need replication
2016-11-03 20:44:43,722 [replication.DistributedWorkQueueWorkAssigner] INFO : Assigned 0 replication
work entries for hdfs://host:9000/accumulo/wal/host+31368/ae4b03ec-159b-44e8-9a88-ccf7fa849c19
2016-11-03 20:44:49,187 [replication.WorkMaker] INFO : Processing replication status record
for hdfs://host:9000/accumulo/wal/host+31368/ae4b03ec-159b-44e8-9a88-ccf7fa849c19 on table
j
2016-11-03 20:44:49,187 [replication.WorkMaker] INFO : Processing replication status record
for hdfs://host:9000/accumulo/wal/host+31368/ae4b03ec-159b-44e8-9a88-ccf7fa849c19 on table
k
2016-11-03 20:44:49,187 [replication.WorkMaker] INFO : Adding work records for hdfs://host:9000/accumulo/wal/host+31368/ae4b03ec-159b-44e8-9a88-ccf7fa849c19
to targets {peer_instance=5h}
2016-11-03 20:44:49,188 [replication.WorkMaker] INFO : Processing replication status record
for hdfs://host:9000/accumulo/wal/host+31368/ae4b03ec-159b-44e8-9a88-ccf7fa849c19 on table
l
2016-11-03 20:44:49,256 [replication.FinishedWorkUpdater] DEBUG: Processing work progress
for hdfs://host:9000/accumulo/wal/host+31368/ae4b03ec-159b-44e8-9a88-ccf7fa849c19 with 3 columns
2016-11-03 20:44:49,256 [replication.FinishedWorkUpdater] DEBUG: For hdfs://host:9000/accumulo/wal/host+31368/ae4b03ec-159b-44e8-9a88-ccf7fa849c19,
source table ID j has replicated through 9223372036854775807
2016-11-03 20:44:49,256 [replication.FinishedWorkUpdater] DEBUG: Updating replication status
entry for hdfs://host:9000/accumulo/wal/host+31368/ae4b03ec-159b-44e8-9a88-ccf7fa849c19 with
[begin: 9223372036854775807 end: 0 infiniteEnd: false closed: false]
2016-11-03 20:44:49,256 [replication.FinishedWorkUpdater] DEBUG: For hdfs://host:9000/accumulo/wal/host+31368/ae4b03ec-159b-44e8-9a88-ccf7fa849c19,
source table ID l has replicated through 9223372036854775807
2016-11-03 20:44:49,256 [replication.FinishedWorkUpdater] DEBUG: Updating replication status
entry for hdfs://host:9000/accumulo/wal/host+31368/ae4b03ec-159b-44e8-9a88-ccf7fa849c19 with
[begin: 9223372036854775807 end: 0 infiniteEnd: false closed: false]
2016-11-03 20:44:49,271 [replication.RemoveCompleteReplicationRecords] DEBUG: Removing hdfs://host:9000/accumulo/wal/host+31368/ae4b03ec-159b-44e8-9a88-ccf7fa849c19
repl:j from replication table
{noformat}

>  Some in-progress files for replication never replicate
> -------------------------------------------------------
>
>                 Key: ACCUMULO-4506
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-4506
>             Project: Accumulo
>          Issue Type: Bug
>          Components: replication
>    Affects Versions: 1.7.2
>            Reporter: Adam J Shook
>
> We're seeing an issue with replication where two files have been in-progress for a long
time and based on the logs are not going to be replicated.  The metadata from the {{accumulo.replication}}
table looks a little funky, with a very large {{begin}} value.
> *Logs*
> {noformat}
> 2016-11-02 19:52:50,900 [replication.DistributedWorkQueueWorkAssigner] DEBUG: Not queueing
work for hdfs://host:9000/accumulo/wal/host+31032/9f038f64-4252-44a0-bfd0-99d4a316b397 to
Remote Name: peer_instance Remote identifier: 5h Source Table ID: k because [begin: 9223372036854775807
end: 0 infiniteEnd: true closed: true createdTime: 1477314365827] doesn't need replication
> 2016-11-02 19:53:08,900 [replication.DistributedWorkQueueWorkAssigner] DEBUG: Not queueing
work for hdfs://host:9000/accumulo/wal/host+31368/ae4b03ec-159b-44e8-9a88-ccf7fa849c19 to
Remote Name: peer_instance Remote identifier: 5i Source Table ID: l because [begin: 9223372036854775807
end: 0 infiniteEnd: true closed: true createdTime: 1477052816174] doesn't need replication
> {noformat}
> *Replication table*
> {noformat}
> scan -r hdfs://host:9000/accumulo/wal/host+31032/9f038f64-4252-44a0-bfd0-99d4a316b397
-t accumulo.replication
> hdfs://host:9000/accumulo/wal/host+31032/9f038f64-4252-44a0-bfd0-99d4a316b397 repl:j
[]    [begin: 0 end: 0 infiniteEnd: true closed: true createdTime: 1477314369633]
> hdfs://host:9000/accumulo/wal/host+31032/9f038f64-4252-44a0-bfd0-99d4a316b397 repl:k
[]    [begin: 9223372036854775807 end: 0 infiniteEnd: true closed: true createdTime: 1477314365827]
> hdfs://host:9000/accumulo/wal/host+31032/9f038f64-4252-44a0-bfd0-99d4a316b397 repl:l
[]    [begin: 9223372036854775807 end: 0 infiniteEnd: true closed: true createdTime: 1477314365707]
> hdfs://host:9000/accumulo/wal/host+31032/9f038f64-4252-44a0-bfd0-99d4a316b397 work:\x01\x00\x00\x00\x17peer_instance\x01\x00\x00\x00\x025g\x01\x00\x00\x00\x01j
[]    [begin: 0 end: 0 infiniteEnd: true closed: true createdTime: 1477314369633]
> hdfs://host:9000/accumulo/wal/host+31032/9f038f64-4252-44a0-bfd0-99d4a316b397 work:\x01\x00\x00\x00\x17peer_instance\x01\x00\x00\x00\x025h\x01\x00\x00\x00\x01k
[]    [begin: 9223372036854775807 end: 0 infiniteEnd: true closed: true createdTime: 1477314365827]
> hdfs://host:9000/accumulo/wal/host+31032/9f038f64-4252-44a0-bfd0-99d4a316b397 work:\x01\x00\x00\x00\x17peer_instance\x01\x00\x00\x00\x025i\x01\x00\x00\x00\x01l
[]    [begin: 9223372036854775807 end: 0 infiniteEnd: true closed: true createdTime: 1477314365707]
> scan -r hdfs://host:9000/accumulo/wal/host+31368/ae4b03ec-159b-44e8-9a88-ccf7fa849c19
-t accumulo.replication
> hdfs://host:9000/accumulo/wal/host+31368/ae4b03ec-159b-44e8-9a88-ccf7fa849c19 repl:j
[]    [begin: 9223372036854775807 end: 0 infiniteEnd: true closed: true createdTime: 1477052819752]
> hdfs://host:9000/accumulo/wal/host+31368/ae4b03ec-159b-44e8-9a88-ccf7fa849c19 repl:k
[]    [begin: 0 end: 0 infiniteEnd: true closed: true createdTime: 1477052816238]
> hdfs://host:9000/accumulo/wal/host+31368/ae4b03ec-159b-44e8-9a88-ccf7fa849c19 repl:l
[]    [begin: 9223372036854775807 end: 0 infiniteEnd: true closed: true createdTime: 1477052816174]
> hdfs://host:9000/accumulo/wal/host+31368/ae4b03ec-159b-44e8-9a88-ccf7fa849c19 work:\x01\x00\x00\x00\x17peer_instance\x01\x00\x00\x00\x025g\x01\x00\x00\x00\x01j
[]    [begin: 9223372036854775807 end: 0 infiniteEnd: true closed: true createdTime: 1477052819752]
> hdfs://host:9000/accumulo/wal/host+31368/ae4b03ec-159b-44e8-9a88-ccf7fa849c19 work:\x01\x00\x00\x00\x17peer_instance\x01\x00\x00\x00\x025h\x01\x00\x00\x00\x01k
[]    [begin: 0 end: 0 infiniteEnd: true closed: true createdTime: 1477052816238]
> hdfs://host:9000/accumulo/wal/host+31368/ae4b03ec-159b-44e8-9a88-ccf7fa849c19 work:\x01\x00\x00\x00\x17peer_instance\x01\x00\x00\x00\x025i\x01\x00\x00\x00\x01l
[]    [begin: 9223372036854775807 end: 0 infiniteEnd: true closed: true createdTime: 1477052816174]
> {noformat}
> *HDFS*
> {noformat}
> hdfs dfs -ls hdfs://host:9000/accumulo/wal/host+31032/9f038f64-4252-44a0-bfd0-99d4a316b397
hdfs://host:9000/accumulo/wal/host+31368/ae4b03ec-159b-44e8-9a88-ccf7fa849c19
> -rwxr-xr-x   3 ubuntu supergroup 1117650900 2016-10-24 13:09 hdfs://host:9000/accumulo/wal/host+31032/9f038f64-4252-44a0-bfd0-99d4a316b397
> -rwxr-xr-x   3 ubuntu supergroup 1171968390 2016-10-21 12:31 hdfs://host:9000/accumulo/wal/host+31368/ae4b03ec-159b-44e8-9a88-ccf7fa849c19
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message