accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Keith Turner (JIRA)" <j...@apache.org>
Subject [jira] [Created] (ACCUMULO-2406) GarbageCollectorIT.dontGCRootLog() fails
Date Tue, 25 Feb 2014 20:10:23 GMT
Keith Turner created ACCUMULO-2406:
--------------------------------------

             Summary: GarbageCollectorIT.dontGCRootLog() fails
                 Key: ACCUMULO-2406
                 URL: https://issues.apache.org/jira/browse/ACCUMULO-2406
             Project: Accumulo
          Issue Type: Bug
         Environment: 2ef2d88598f5e14f8f96b77fecca66dcd7196448
            Reporter: Keith Turner
            Assignee: Keith Turner
             Fix For: 1.6.0


Saw the following failure while running ITs.  The test is not using MiniDFS, which is what
caused the problem.  Because minidfs was not used, recent walog updates were lost.  Below
are my notes from investigating.  To fix this I am going to look into using RawLocalFS like
I did w/ VolumeIT.

{noformat}
dontGCRootLog(org.apache.accumulo.test.functional.GarbageCollectorIT)  Time elapsed: 108.685
sec  <<< ERROR!
java.lang.RuntimeException: org.apache.accumulo.core.client.impl.AccumuloServerException:
Error on server rd6ul-14723v.infosec.tycho.ncsc.mil:34130
	at org.apache.accumulo.core.client.impl.ScannerIterator.hasNext(ScannerIterator.java:187)
	at org.apache.accumulo.test.functional.GarbageCollectorIT.dontGCRootLog(GarbageCollectorIT.java:163)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
	at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
Caused by: org.apache.accumulo.core.client.impl.AccumuloServerException: Error on server rd6ul-14723v.infosec.tycho.ncsc.mil:34130
	at org.apache.accumulo.core.client.impl.ThriftScanner.scan(ThriftScanner.java:285)
	at org.apache.accumulo.core.client.impl.ScannerIterator$Reader.run(ScannerIterator.java:84)
	at org.apache.accumulo.core.client.impl.ScannerIterator.hasNext(ScannerIterator.java:177)
	... 10 more
Caused by: org.apache.thrift.TApplicationException: Internal error processing startScan
	at org.apache.thrift.TApplicationException.read(TApplicationException.java:108)
	at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:71)
	at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Client.recv_startScan(TabletClientService.java:226)
	at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Client.startScan(TabletClientService.java:202)
	at org.apache.accumulo.core.client.impl.ThriftScanner.scan(ThriftScanner.java:399)
	at org.apache.accumulo.core.client.impl.ThriftScanner.scan(ThriftScanner.java:277)
	... 12 more
{noformat}

Looking in the tserver logs, I found this error was caused by a missing file.

{noformat}
TabletServer_1756174045.out:java.io.IOException: Failed to open V/accumulo/tables/!0/table_info/A0000004.rf
{noformat}

The file was garbage collected

{noformat}
SimpleGarbageCollector_1083069032.out:2014-02-25 12:48:12,897 [gc.SimpleGarbageCollector]
DEBUG: Deleting V/accumulo/tables/!0/table_info/A0000004.rf
{noformat}

The file was created on another tablet server and compacted away.  That tablet server was
killed by the test code right after doing the compaction.  The tserver is killed shortly after
compaction 4 into 5.

{noformat}
TabletServer_1242447446.out:2014-02-25 12:48:07,256 [tserver.Tablet] DEBUG: Starting MajC
!0;~< (USER) [V/accumulo/tables/!0/table_info/A0000003.rf] --> V/accumulo/tables/!0/table_info/A0000004.rf_tmp
 []
TabletServer_1242447446.out:2014-02-25 12:48:07,290 [tserver.Tablet] TABLET_HIST: !0;~<
MajC [V/accumulo/tables/!0/table_info/A0000003.rf] --> V/accumulo/tables/!0/table_info/A0000004.rf
TabletServer_1242447446.out:2014-02-25 12:48:10,007 [tserver.Tablet] DEBUG: Starting MajC
!0;~< (USER) [V/accumulo/tables/!0/table_info/A0000004.rf] --> V/accumulo/tables/!0/table_info/A0000005.rf_tmp
 []
TabletServer_1242447446.out:2014-02-25 12:48:10,044 [tserver.Tablet] TABLET_HIST: !0;~<
MajC [V/accumulo/tables/!0/table_info/A0000004.rf] --> V/accumulo/tables/!0/table_info/A0000005.rf
{noformat}

So when the root tablet recovered it brought back old data. Looking in the root tablet write
ahead log, it only contained the following after the crash.  This is why A0000004.rf was brought
back and A0000005.rf was forgotten.

{noformat}
COMPACTION_START 1 8 file:/local/disk1/jenkins/workspace/accumulo16/test/target/mini-tests/org.apache.accumulo.test.functional.GarbageCollectorIT_dontGCRootLog/accumulo/tables/+r/root_tablet/F000004m.rf

COMPACTION_FINISH 1 9

DEFINE_TABLET 1 9 +r<<

MANY_MUTATIONS 1 9
1 mutations:
  ~delfile:/local/disk1/jenkins/workspace/accumulo16/test/target/mini-tests/org.apache.accumulo.test.functional.GarbageCollectorIT_dontGCRootLog/accumulo/tables/!0/table_info/A0000002.rf
      : [system]:23 [] <deleted>

MANY_MUTATIONS 1 9
1 mutations:
  !0<
      srv:flush [system]:24 [] 4
      srv:lock [system]:24 [] tservers/host1:44035/zlock-0000000000$1446a28bfef0003

MANY_MUTATIONS 1 9
1 mutations:
  !0;~
      srv:flush [system]:25 [] 4
      srv:lock [system]:25 [] tservers/host1:57501/zlock-0000000000$1446a28bfef0001

MANY_MUTATIONS 1 9
1 mutations:
  !0<
      srv:compact [system]:26 [] 4
      srv:lock [system]:26 [] tservers/host1:44035/zlock-0000000000$1446a28bfef0003

MANY_MUTATIONS 1 9
1 mutations:
  ~delfile:/local/disk1/jenkins/workspace/accumulo16/test/target/mini-tests/org.apache.accumulo.test.functional.GarbageCollectorIT_dontGCRootLog/accumulo/tables/!0/table_info/A0000003.rf
      : [system]:27 [] 

MANY_MUTATIONS 1 9
1 mutations:
  !0;~
      file:file:/local/disk1/jenkins/workspace/accumulo16/test/target/mini-tests/org.apache.accumulo.test.functional.GarbageCollectorIT_dontGCRootLog/accumulo/tables/!0/table_info/A0000003.rf
[system]:28 [] <deleted>
      file:file:/local/disk1/jenkins/workspace/accumulo16/test/target/mini-tests/org.apache.accumulo.test.functional.GarbageCollectorIT_dontGCRootLog/accumulo/tables/!0/table_info/A0000004.rf
[system]:28 [] 578,5
      srv:compact [system]:28 [] 4
      last:1446a28bfef0001 [system]:28 [] host1:57501
      srv:lock [system]:28 [] tservers/host1:57501/zlock-0000000000$1446a28bfef0001

{noformat}

below is some info about the root tablet recovery from the tserver logs

{noformat}
2014-02-25 12:49:26,353 [tserver.Tablet] INFO : Starting Write-Ahead Log recovery for +r<<
2014-02-25 12:49:26,356 [tserver.TabletServer] INFO : Looking for V/accumulo/recovery/770c5d8a-c598-4e3c-8b22-9ce27eee5f40/finished
2014-02-25 12:49:26,360 [log.SortedLogRecovery] INFO : Looking at mutations from V/accumulo/recovery/770c5d8a-c598-4e3c-8b22-9ce27eee5f40
for +r<<
2014-02-25 12:49:26,376 [log.SortedLogRecovery] DEBUG: Found tid, seq 1 1
2014-02-25 12:49:26,378 [log.SortedLogRecovery] DEBUG: minor compaction into V/accumulo/tables/+r/root_tablet/F000004g.rf
finished, but was still in the METADATA
2014-02-25 12:49:26,378 [log.SortedLogRecovery] DEBUG: minor compaction into V/accumulo/tables/+r/root_tablet/F000004i.rf
finished, but was still in the METADATA
2014-02-25 12:49:26,378 [log.SortedLogRecovery] DEBUG: minor compaction into V/accumulo/tables/+r/root_tablet/F000004k.rf
finished, but was still in the METADATA
2014-02-25 12:49:26,378 [log.SortedLogRecovery] DEBUG: minor compaction into V/accumulo/tables/+r/root_tablet/F000004m.rf
finished, but was still in the METADATA
2014-02-25 12:49:26,382 [log.SortedLogRecovery] INFO : Scanning for mutations starting at
sequence number 8 for tid 1
2014-02-25 12:49:26,386 [log.SortedLogRecovery] INFO : Recovery complete for +r<< using
V/accumulo/recovery/770c5d8a-c598-4e3c-8b22-9ce27eee5f40
2014-02-25 12:49:26,387 [tserver.Tablet] INFO : Write-Ahead Log recovery complete for +r<<
(6 mutations applied, 13 entries created)
2014-02-25 12:49:26,396 [tserver.Tablet] TABLET_HIST: +r<< opened
{noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Mime
View raw message