accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF subversion and git services (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ACCUMULO-2406) GarbageCollectorIT.dontGCRootLog() fails
Date Tue, 25 Feb 2014 21:13:22 GMT

    [ https://issues.apache.org/jira/browse/ACCUMULO-2406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13912068#comment-13912068
] 

ASF subversion and git services commented on ACCUMULO-2406:
-----------------------------------------------------------

Commit b23408faea79a9839cbb39932004254229beac79 in accumulo's branch refs/heads/master from
[~keith_turner]
[ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=b23408f ]

ACCUMULO-2406 make GarbageCollectorIT use RawLocalFileSystem so walog writes are not lost


> GarbageCollectorIT.dontGCRootLog() fails
> ----------------------------------------
>
>                 Key: ACCUMULO-2406
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-2406
>             Project: Accumulo
>          Issue Type: Bug
>         Environment: 2ef2d88598f5e14f8f96b77fecca66dcd7196448
>            Reporter: Keith Turner
>            Assignee: Keith Turner
>             Fix For: 1.6.0
>
>
> Saw the following failure while running ITs.  The test is not using MiniDFS, which is
what caused the problem.  Because minidfs was not used, recent walog updates were lost.  Below
are my notes from investigating.  To fix this I am going to look into using RawLocalFS like
I did w/ VolumeIT.
> {noformat}
> dontGCRootLog(org.apache.accumulo.test.functional.GarbageCollectorIT)  Time elapsed:
108.685 sec  <<< ERROR!
> java.lang.RuntimeException: org.apache.accumulo.core.client.impl.AccumuloServerException:
Error on server rd6ul-14723v.infosec.tycho.ncsc.mil:34130
> 	at org.apache.accumulo.core.client.impl.ScannerIterator.hasNext(ScannerIterator.java:187)
> 	at org.apache.accumulo.test.functional.GarbageCollectorIT.dontGCRootLog(GarbageCollectorIT.java:163)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 	at java.lang.reflect.Method.invoke(Method.java:606)
> 	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
> 	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> 	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
> 	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> 	at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
> Caused by: org.apache.accumulo.core.client.impl.AccumuloServerException: Error on server
rd6ul-14723v.infosec.tycho.ncsc.mil:34130
> 	at org.apache.accumulo.core.client.impl.ThriftScanner.scan(ThriftScanner.java:285)
> 	at org.apache.accumulo.core.client.impl.ScannerIterator$Reader.run(ScannerIterator.java:84)
> 	at org.apache.accumulo.core.client.impl.ScannerIterator.hasNext(ScannerIterator.java:177)
> 	... 10 more
> Caused by: org.apache.thrift.TApplicationException: Internal error processing startScan
> 	at org.apache.thrift.TApplicationException.read(TApplicationException.java:108)
> 	at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:71)
> 	at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Client.recv_startScan(TabletClientService.java:226)
> 	at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Client.startScan(TabletClientService.java:202)
> 	at org.apache.accumulo.core.client.impl.ThriftScanner.scan(ThriftScanner.java:399)
> 	at org.apache.accumulo.core.client.impl.ThriftScanner.scan(ThriftScanner.java:277)
> 	... 12 more
> {noformat}
> Looking in the tserver logs, I found this error was caused by a missing file.
> {noformat}
> TabletServer_1756174045.out:java.io.IOException: Failed to open V/accumulo/tables/!0/table_info/A0000004.rf
> {noformat}
> The file was garbage collected
> {noformat}
> SimpleGarbageCollector_1083069032.out:2014-02-25 12:48:12,897 [gc.SimpleGarbageCollector]
DEBUG: Deleting V/accumulo/tables/!0/table_info/A0000004.rf
> {noformat}
> The file was created on another tablet server and compacted away.  That tablet server
was killed by the test code right after doing the compaction.  The tserver is killed shortly
after compaction 4 into 5.
> {noformat}
> TabletServer_1242447446.out:2014-02-25 12:48:07,256 [tserver.Tablet] DEBUG: Starting
MajC !0;~< (USER) [V/accumulo/tables/!0/table_info/A0000003.rf] --> V/accumulo/tables/!0/table_info/A0000004.rf_tmp
 []
> TabletServer_1242447446.out:2014-02-25 12:48:07,290 [tserver.Tablet] TABLET_HIST: !0;~<
MajC [V/accumulo/tables/!0/table_info/A0000003.rf] --> V/accumulo/tables/!0/table_info/A0000004.rf
> TabletServer_1242447446.out:2014-02-25 12:48:10,007 [tserver.Tablet] DEBUG: Starting
MajC !0;~< (USER) [V/accumulo/tables/!0/table_info/A0000004.rf] --> V/accumulo/tables/!0/table_info/A0000005.rf_tmp
 []
> TabletServer_1242447446.out:2014-02-25 12:48:10,044 [tserver.Tablet] TABLET_HIST: !0;~<
MajC [V/accumulo/tables/!0/table_info/A0000004.rf] --> V/accumulo/tables/!0/table_info/A0000005.rf
> {noformat}
> So when the root tablet recovered it brought back old data. Looking in the root tablet
write ahead log, it only contained the following after the crash.  This is why A0000004.rf
was brought back and A0000005.rf was forgotten.
> {noformat}
> COMPACTION_START 1 8 file:/local/disk1/jenkins/workspace/accumulo16/test/target/mini-tests/org.apache.accumulo.test.functional.GarbageCollectorIT_dontGCRootLog/accumulo/tables/+r/root_tablet/F000004m.rf
> COMPACTION_FINISH 1 9
> DEFINE_TABLET 1 9 +r<<
> MANY_MUTATIONS 1 9
> 1 mutations:
>   ~delfile:/local/disk1/jenkins/workspace/accumulo16/test/target/mini-tests/org.apache.accumulo.test.functional.GarbageCollectorIT_dontGCRootLog/accumulo/tables/!0/table_info/A0000002.rf
>       : [system]:23 [] <deleted>
> MANY_MUTATIONS 1 9
> 1 mutations:
>   !0<
>       srv:flush [system]:24 [] 4
>       srv:lock [system]:24 [] tservers/host1:44035/zlock-0000000000$1446a28bfef0003
> MANY_MUTATIONS 1 9
> 1 mutations:
>   !0;~
>       srv:flush [system]:25 [] 4
>       srv:lock [system]:25 [] tservers/host1:57501/zlock-0000000000$1446a28bfef0001
> MANY_MUTATIONS 1 9
> 1 mutations:
>   !0<
>       srv:compact [system]:26 [] 4
>       srv:lock [system]:26 [] tservers/host1:44035/zlock-0000000000$1446a28bfef0003
> MANY_MUTATIONS 1 9
> 1 mutations:
>   ~delfile:/local/disk1/jenkins/workspace/accumulo16/test/target/mini-tests/org.apache.accumulo.test.functional.GarbageCollectorIT_dontGCRootLog/accumulo/tables/!0/table_info/A0000003.rf
>       : [system]:27 [] 
> MANY_MUTATIONS 1 9
> 1 mutations:
>   !0;~
>       file:file:/local/disk1/jenkins/workspace/accumulo16/test/target/mini-tests/org.apache.accumulo.test.functional.GarbageCollectorIT_dontGCRootLog/accumulo/tables/!0/table_info/A0000003.rf
[system]:28 [] <deleted>
>       file:file:/local/disk1/jenkins/workspace/accumulo16/test/target/mini-tests/org.apache.accumulo.test.functional.GarbageCollectorIT_dontGCRootLog/accumulo/tables/!0/table_info/A0000004.rf
[system]:28 [] 578,5
>       srv:compact [system]:28 [] 4
>       last:1446a28bfef0001 [system]:28 [] host1:57501
>       srv:lock [system]:28 [] tservers/host1:57501/zlock-0000000000$1446a28bfef0001
> {noformat}
> below is some info about the root tablet recovery from the tserver logs
> {noformat}
> 2014-02-25 12:49:26,353 [tserver.Tablet] INFO : Starting Write-Ahead Log recovery for
+r<<
> 2014-02-25 12:49:26,356 [tserver.TabletServer] INFO : Looking for V/accumulo/recovery/770c5d8a-c598-4e3c-8b22-9ce27eee5f40/finished
> 2014-02-25 12:49:26,360 [log.SortedLogRecovery] INFO : Looking at mutations from V/accumulo/recovery/770c5d8a-c598-4e3c-8b22-9ce27eee5f40
for +r<<
> 2014-02-25 12:49:26,376 [log.SortedLogRecovery] DEBUG: Found tid, seq 1 1
> 2014-02-25 12:49:26,378 [log.SortedLogRecovery] DEBUG: minor compaction into V/accumulo/tables/+r/root_tablet/F000004g.rf
finished, but was still in the METADATA
> 2014-02-25 12:49:26,378 [log.SortedLogRecovery] DEBUG: minor compaction into V/accumulo/tables/+r/root_tablet/F000004i.rf
finished, but was still in the METADATA
> 2014-02-25 12:49:26,378 [log.SortedLogRecovery] DEBUG: minor compaction into V/accumulo/tables/+r/root_tablet/F000004k.rf
finished, but was still in the METADATA
> 2014-02-25 12:49:26,378 [log.SortedLogRecovery] DEBUG: minor compaction into V/accumulo/tables/+r/root_tablet/F000004m.rf
finished, but was still in the METADATA
> 2014-02-25 12:49:26,382 [log.SortedLogRecovery] INFO : Scanning for mutations starting
at sequence number 8 for tid 1
> 2014-02-25 12:49:26,386 [log.SortedLogRecovery] INFO : Recovery complete for +r<<
using V/accumulo/recovery/770c5d8a-c598-4e3c-8b22-9ce27eee5f40
> 2014-02-25 12:49:26,387 [tserver.Tablet] INFO : Write-Ahead Log recovery complete for
+r<< (6 mutations applied, 13 entries created)
> 2014-02-25 12:49:26,396 [tserver.Tablet] TABLET_HIST: +r<< opened
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Mime
View raw message