accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Keys Botzum <kbot...@maprtech.com>
Subject Re: Accumulo on MapR - BatchScanSplitTest
Date Thu, 12 Apr 2012 12:03:08 GMT
Eric,

You nailed it. That was the issue. Not only does the test complete, it now completes much
more quickly. Huge improvement. However, in my defense, Accumulo was configured to use those
arguments in conf/accumulo-env.sh. The issue seems to be that the test driver didn't honor
those settings. I think I fixed this via a hack to TestUtils.py. I edited this section (new
stuff in bold/red):


    def runOn(self, host, cmd, **opts):
        cmd = map(str, cmd)
        log.debug('%s: %s', host, ' '.join(cmd))
        if host == 'localhost':
            os.environ['ACCUMULO_TSERVER_OPTS']='-Xmx700m -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75'
            os.environ['ACCUMULO_GENERAL_OPTS']='-Dorg.apache.accumulo.config.file=%s ' %
SITE
            os.environ['ACCUMULO_LOG_DIR']= ACCUMULO_HOME + '/logs/' + ID
            return Popen(cmd, stdout=PIPE, stderr=PIPE, **opts)


At least for me the test now seems to run successfully so I fixed at least that path through
the code.  Obviously, that's a quick and dirty hack. I'm sure someone more familiar with the
framework can do a cleaner fix.

As soon as I have a moment I'm going to go back and rerun all of the tests. I suspect they
will all run much faster because of this change and the timeout issues I was seeing may simply
go away.

Once again, thanks!
Keys
________________________________
Keys Botzum
Senior Principal Technologist
WW Systems Engineering
kbotzum@maprtech.com
443-718-0098
MapR Technologies
http://www.mapr.com



On Apr 11, 2012, at 1:12 PM, Eric Newton wrote:

> Your tablet server crashed (the master reported the loss of the server).  My guess is
that it had a stop-the-world gc that lasted longer than the zookeeper timeout.
> 
> Is there some reason why you aren't using the suggested
> 
> -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75
> 
> jvm arguments?
> 
> -Eric
> 
> On Tue, Apr 10, 2012 at 2:21 PM, Keys Botzum <kbotzum@maprtech.com> wrote:
> At this point all but two of the Accumulo test/system/auto tests have completed successfully.
This test is failing and I'm not quite sure why: simple.batchScanSplit.BatchScanSplitTest
> 
> When I run it, this is the output I see:
> ./run.py -t batchandsplittest -d -v10
> ….
> DEBUG:test.auto:localhost: /opt/accumulo-1.4.0/bin/accumulo org.apache.accumulo.server.test.functional.FunctionalTest
-m localhost -u root -p secret -i SE-test-04-17205 org.apache.accumulo.server.test.functional.BatchScanSplitTest
run
> DEBUG:test.auto:Waiting for /opt/accumulo-1.4.0/bin/accumulo org.apache.accumulo.server.test.functional.FunctionalTest
-m localhost -u root -p secret -i SE-test-04-17205 org.apache.accumulo.server.test.functional.BatchScanSplitTest
run to stop in 240 secs
> DEBUG:test.auto:out: 10 11:01:17,382 [admin.TableOperations] INFO : Problem with metadata
table, first entry for table 1- 1<;00002 - has non null prev end row ... retrying ...
> DEBUG:test.auto:out: splits : [0000019cd, 0000026b4, 000004082, 000005a50, 000006737,
00000741e, 000007a0f, 000008, 000008dec, 000009ad3, 00000b4a1, 00000ba5, 00000c, 00000ce6f,
00000db56, 00000f524, 00000fa92, 00001, 000010ef2, 000011bd9, 0000135a7, 000013ad4, 000014,
000014f75, 000015c5c, 00001762a, 000017b15, 000018, 000018ff8, 000019cdf, 00001b6ad, 00001bb57,
00001c, 00001d07b, 00001dd62, 00001f730, 00001fb98, 00002, 0000210fe, 000021de5, 0000237b3,
000023bda, 000024, 000025181, 000025e68, 000027836, 000027c1b, 000028, 000029204, 000029eeb,
00002b8b9, 00002bc5d, 00002c, 00002d287, 00002df6e, 00002f93c, 00002fc9e, 00003, 00003130a,
000031ff1, 0000339bf, 000033ce, 000034, 00003538d, 000036074, 0000374b5, 000037a5b, 000038,
000038e83, 000039b6a, 00003b538, 00003c, 00003dbed, 00003e8d4]
> DEBUG:test.auto:out:
> DEBUG:test.auto:out: rate :
> DEBUG:test.auto:out: 205.34
> DEBUG:test.auto:out: 10 11:03:01,983 [impl.ThriftScanner] WARN : Security Violation in
scan request to 10.250.99.204:39256: ThriftSecurityException(user:root, code:null)
> DEBUG:test.auto:err: Thread "org.apache.accumulo.server.test.functional.FunctionalTest"
died null
> DEBUG:test.auto:err: java.lang.reflect.InvocationTargetException
> DEBUG:test.auto:err:    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> DEBUG:test.auto:err:    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at org.apache.accumulo.start.Main$1.run(Main.java:89)
>        at java.lang.Thread.run(Thread.java:662)
> DEBUG:test.auto:err: Caused by: java.lang.NullPointerException
> DEBUG:test.auto:err:    at org.apache.accumulo.core.client.AccumuloSecurityException.getDefaultErrorMessage(AccumuloSecurityException.java:30)
>        at org.apache.accumulo.core.client.AccumuloSecurityException.<init>(AccumuloSecurityException.java:70)
>        at org.apache.accumulo.core.client.impl.ThriftScanner.getBatchFromServer(ThriftScanner.java:153)
>        at org.apache.accumulo.core.client.impl.MetadataLocationObtainer.lookupTablet(MetadataLocationObtainer.java:88)
>        at org.apache.accumulo.core.client.impl.TabletLocatorImpl.lookupTabletLocation(TabletLocatorImpl.java:393)
> DEBUG:test.auto:err:    at org.apache.accumulo.core.client.impl.TabletLocatorImpl._locateTablet(TabletLocatorImpl.java:536)
>        at org.apache.accumulo.core.client.impl.TabletLocator$1._locateTablet(TabletLocator.java:115)
>        at org.apache.accumulo.core.client.impl.TabletLocatorImpl.locateTablet(TabletLocatorImpl.java:370)
>        at org.apache.accumulo.core.client.impl.TabletLocatorImpl.lookupTabletLocation(TabletLocatorImpl.java:390)
>        at org.apache.accumulo.core.client.impl.TabletLocatorImpl._locateTablet(TabletLocatorImpl.java:536)
>        at org.apache.accumulo.core.client.impl.TabletLocatorImpl.binRanges(TabletLocatorImpl.java:215)
>        at org.apache.accumulo.core.client.impl.TabletLocatorImpl.binRanges(TabletLocatorImpl.java:288)
>        at org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator.binRanges(TabletServerBatchReaderIterator.java:236)
>        at org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator.processFailures(TabletServerBatchReaderIterator.java:301)
> DEBUG:test.auto:err:    at org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator.access$900(TabletServerBatchReaderIterator.java:73)
>        at org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator$QueryTask.run(TabletServerBatchReaderIterator.java:374)
>        at org.apache.accumulo.cloudtrace.instrument.TraceRunnable.run(TraceRunnable.java:47)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>        ... 1 more
> DEBUG:test.auto:Output from command: 10 11:01:17,382 [admin.TableOperations] INFO : Problem
with metadata table, first entry for table 1- 1<;00002 - has non null prev end row ...
retrying ...
> splits : [0000019cd, 0000026b4, 000004082, 000005a50, 000006737, 00000741e, 000007a0f,
000008, 000008dec, 000009ad3, 00000b4a1, 00000ba5, 00000c, 00000ce6f, 00000db56, 00000f524,
00000fa92, 00001, 000010ef2, 000011bd9, 0000135a7, 000013ad4, 000014, 000014f75, 000015c5c,
00001762a, 000017b15, 000018, 000018ff8, 000019cdf, 00001b6ad, 00001bb57, 00001c, 00001d07b,
00001dd62, 00001f730, 00001fb98, 00002, 0000210fe, 000021de5, 0000237b3, 000023bda, 000024,
000025181, 000025e68, 000027836, 000027c1b, 000028, 000029204, 000029eeb, 00002b8b9, 00002bc5d,
00002c, 00002d287, 00002df6e, 00002f93c, 00002fc9e, 00003, 00003130a, 000031ff1, 0000339bf,
000033ce, 000034, 00003538d, 000036074, 0000374b5, 000037a5b, 000038, 000038e83, 000039b6a,
00003b538, 00003c, 00003dbed, 00003e8d4]
> rate : 205.34
> 10 11:03:01,983 [impl.ThriftScanner] WARN : Security Violation in scan request to 10.250.99.204:39256:
ThriftSecurityException(user:root, code:null)
> ERROR:test.auto:This looks like a stack trace: Thread "org.apache.accumulo.server.test.functional.FunctionalTest"
died null
> java.lang.reflect.InvocationTargetException
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at org.apache.accumulo.start.Main$1.run(Main.java:89)
>        at java.lang.Thread.run(Thread.java:662)
> Caused by: java.lang.NullPointerException
>        at org.apache.accumulo.core.client.AccumuloSecurityException.getDefaultErrorMessage(AccumuloSecurityException.java:30)
>        at org.apache.accumulo.core.client.AccumuloSecurityException.<init>(AccumuloSecurityException.java:70)
>        at org.apache.accumulo.core.client.impl.ThriftScanner.getBatchFromServer(ThriftScanner.java:153)
>        at org.apache.accumulo.core.client.impl.MetadataLocationObtainer.lookupTablet(MetadataLocationObtainer.java:88)
>        at org.apache.accumulo.core.client.impl.TabletLocatorImpl.lookupTabletLocation(TabletLocatorImpl.java:393)
>        at org.apache.accumulo.core.client.impl.TabletLocatorImpl._locateTablet(TabletLocatorImpl.java:536)
>        at org.apache.accumulo.core.client.impl.TabletLocator$1._locateTablet(TabletLocator.java:115)
>        at org.apache.accumulo.core.client.impl.TabletLocatorImpl.locateTablet(TabletLocatorImpl.java:370)
>        at org.apache.accumulo.core.client.impl.TabletLocatorImpl.lookupTabletLocation(TabletLocatorImpl.java:390)
>        at org.apache.accumulo.core.client.impl.TabletLocatorImpl._locateTablet(TabletLocatorImpl.java:536)
>        at org.apache.accumulo.core.client.impl.TabletLocatorImpl.binRanges(TabletLocatorImpl.java:215)
>        at org.apache.accumulo.core.client.impl.TabletLocatorImpl.binRanges(TabletLocatorImpl.java:288)
>        at org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator.binRanges(TabletServerBatchReaderIterator.java:236)
>        at org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator.processFailures(TabletServerBatchReaderIterator.java:301)
>        at org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator.access$900(TabletServerBatchReaderIterator.java:73)
>        at org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator$QueryTask.run(TabletServerBatchReaderIterator.java:374)
>        at org.apache.accumulo.cloudtrace.instrument.TraceRunnable.run(TraceRunnable.java:47)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>        ... 1 more
> 
> FAIL
> ======================================================================
> FAIL: runTest (simple.batchScanSplit.BatchScanSplitTest)
> ----------------------------------------------------------------------
> Traceback (most recent call last):
>  File "/opt/accumulo-1.4.0/test/system/auto/JavaTest.py", line 57, in runTest
>    self.waitForStop(handle, self.maxRuntime)
>  File "/opt/accumulo-1.4.0/test/system/auto/TestUtils.py", line 368, in waitForStop
>    self.assert_(self.processResult(out, err, handle.returncode))
> AssertionError: False is not true
> 
> 
> ======================================================================
> FAIL: runTest (simple.batchScanSplit.BatchScanSplitTest)
> ----------------------------------------------------------------------
> Traceback (most recent call last):
>  File "/opt/accumulo-1.4.0/test/system/auto/JavaTest.py", line 57, in runTest
>    self.waitForStop(handle, self.maxRuntime)
>  File "/opt/accumulo-1.4.0/test/system/auto/TestUtils.py", line 368, in waitForStop
>    self.assert_(self.processResult(out, err, handle.returncode))
> AssertionError: False is not true
> 
> ----------------------------------------------------------------------
> Ran 1 test in 118.172s
> 
> FAILED (failures=1)
> 
> 
> In this case, I found nothing in the logs that seemed helpful, but perhaps I don't understand
what I'm looking at. All I see in the tserver_xxx.debug.log prior to the error is this and
the times are nearly 2 minutes before the failure.
> 
> 10 11:01:20,906 [tabletserver.Tablet] DEBUG: completeClose(saveState=true completeClose=false)
1;00001bb57;00001b6ad
> 10 11:01:20,908 [tabletserver.TabletServer] DEBUG: ScanSess tid 10.250.99.204:58769 !0
1 entries in 0.00 secs, nbTimes = [1 1 1.00 1]
> 10 11:01:20,911 [tabletserver.TabletServer] DEBUG: ScanSess tid 10.250.99.204:58769 !0
1 entries in 0.00 secs, nbTimes = [0 0 0.00 1]
> 10 11:01:20,911 [file.FileUtil] DEBUG: Found midPoint from indexes in   0.01 secs.
> 
> 10 11:01:20,913 [tabletserver.TabletServer] DEBUG: ScanSess tid 10.250.99.204:58769 !0
1 entries in 0.00 secs, nbTimes = [1 1 1.00 1]
> 10 11:01:20,914 [file.FileUtil] WARN : Failed to find mid point using indexes, falling
back to data files which is slower. No entries between 00000fa92 and 00000fd49 for [/user/mapr/accumulo-SE-test-04-17205/tables/1/default_tablet/F0000000.rf]
> 10 11:01:20,930 [file.FileUtil] DEBUG: Found midPoint from indexes in   0.02 secs.
> 
> 10 11:01:20,935 [file.FileUtil] WARN : Failed to find mid point using indexes, falling
back to data files which is slower. No entries between 00000fd49 and 00001 for [/user/mapr/accumulo-SE-test-04-17205/tables/1/default_tablet/F0000000.rf]
> 10 11:01:20,952 [file.FileUtil] DEBUG: Found midPoint from indexes in   0.02 secs.
> 
> There is no further output. Here is the complete tar and gzip of the logs:
> 
> 
> 
> 
> I really have no idea what is wrong here. I suspect some kind of security issue based
on the stack trace, but its hard to tell. It looks like Accumulo was trying to report a security
error but the error handling code had an issue. Is there some setting I need to provide to
ensure that Accumulo has access to its error messages?
> 
> If anyone has suggestions on how to look into this further from the Accumulo side, I'd
really appreciate it.
> 
> Thanks,
> Keys
> ________________________________
> Keys Botzum
> Senior Principal Technologist
> WW Systems Engineering
> kbotzum@maprtech.com
> 443-718-0098
> MapR Technologies
> http://www.mapr.com
> 
> 
> 


Mime
View raw message