accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Keys Botzum <kbot...@maprtech.com>
Subject Accumulo on MapR Continued
Date Fri, 30 Mar 2012 18:07:49 GMT
I've been building on the work of Todd Stavish to put Accumulo onto MapR:

http://t.co/RJJ8Ht4B
http://mail-archives.apache.org/mod_mbox/accumulo-user/201203.mbox/%3CCADKA3WiSRDrTw_vKcB9zrP0oyq01jQfZOBGQDRnujEjQ3kHd2w@mail.gmail.com%3E


I've installed Accumulo on a 5 node MapR cluster (real hardware) basing my work on what Todd
posted earlier. I'm using Accumulo 1.4.0 RC3 (downloaded from http://people.apache.org/~kturner/1.4.0rc3/)
which contains the file system fix (https://issues.apache.org/jira/browse/ACCUMULO-476).

The install went fine. I can start the Accumulo shell and create tables and such without difficulty.


I decided then to do a more complete test using the Accumulo built in tests found under ACCUMULO_HOME/test
and this is where I started encountering problems. It may be that I just don't understand
how to configure and run the tests properly or perhaps something deeper. Any help would be
greatly appreciated.

I decided to run the tests under the auto directory as those seemed promising (was that a
good idea?). As instructed in the README I can start tests in local mode (no MapReduce job)
using ./run.py -t testname or leaving out the -t to run all tests. When I run the tests (all
at once or individually) while the majority succeed a small subset consistently fail. They
are:

FAIL: runTest (simple.batchScanSplit.BatchScanSplitTest)
FAIL: runTest (simple.bulkSplitOptimization.BulkSplitOptimizationTest)
FAIL: runTest (simple.deleterows.DeleteRowsSplitTest)
FAIL: runTest (simple.dynamic.DynamicClassloader)
FAIL: runTest (simple.largeRow.LargeRowTest)
FAIL: runTest (simple.merge.MergeTest)
FAIL: runTest (simple.zooCacheTest.ZooCacheTest)

Some fail because of timeouts, other fail with assertion errors. Rather than dumping every
issue into the forum, I thought I'd just start by focusing on the first test - simple.batchScanSplit.BatchScanSplitTest.
When I run this test by itself, this is the output:

./run.py -t ulkSplitOptimizationTest -d -v 10
…
DEBUG:test.auto:out:        1,010 records written |
DEBUG:test.auto:out:   72,142 records/sec |       92,920 bytes written | 6,637,142 bytes/sec
|
DEBUG:test.auto:out:  0.014 secs
FAIL
======================================================================
FAIL: runTest (simple.bulkSplitOptimization.BulkSplitOptimizationTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/opt/accumulo-1.4.0/test/system/auto/JavaTest.py", line 56, in runTest
    self.waitForStop(handle, self.maxRuntime)
  File "/opt/accumulo-1.4.0/test/system/auto/TestUtils.py", line 371, in waitForStop
    self.fail("Process failed to finish in %s seconds" % secs)
AssertionError: Process failed to finish in 200 seconds


======================================================================
FAIL: runTest (simple.bulkSplitOptimization.BulkSplitOptimizationTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/opt/accumulo-1.4.0/test/system/auto/JavaTest.py", line 56, in runTest
    self.waitForStop(handle, self.maxRuntime)
  File "/opt/accumulo-1.4.0/test/system/auto/TestUtils.py", line 371, in waitForStop
    self.fail("Process failed to finish in %s seconds" % secs)
AssertionError: Process failed to finish in 200 seconds

----------------------------------------------------------------------
Ran 1 test in 208.163s

FAILED (failures=1)

Since I set the test to preserve the logs, I look at the logs and found only one thing that
jumps out at me as the problem. I see this message repeated over and over while the test is
running:

30 11:00:14,587 [tabletserver.TabletServer] DEBUG: ScanSess tid 10.250.99.204:45335 !0 2 entries
in 0.00 secs, nbTimes = [1 1 1.00 1] 
30 11:00:15,092 [tabletserver.TabletServer] DEBUG: ScanSess tid 10.250.99.204:45335 !0 2 entries
in 0.00 secs, nbTimes = [0 0 0.00 1] 
30 11:00:15,499 [file.FileUtil] DEBUG: Too many indexes (100) to open at once for null null,
reducing in tmpDir = /user/mapr/accumulo-SE-test-04-17598/tmp/idxReduce_1785031962
30 11:00:15,552 [tabletserver.TabletServer] DEBUG: gc PS Scavenge=4.22(+0.00) secs PS MarkSweep=0.18(+0.00)
secs freemem=74,816,456(+5,884,960) totalmem=110,100,480
30 11:00:15,596 [tabletserver.TabletServer] DEBUG: ScanSess tid 10.250.99.204:45335 !0 2 entries
in 0.00 secs, nbTimes = [1 1 1.00 1] 
30 11:00:15,643 [file.FileUtil] DEBUG: Finished reducing indexes for null null in   0.14 secs
30 11:00:15,654 [file.FileUtil] DEBUG: Found midPoint from indexes in   0.01 secs.

30 11:00:15,778 [tabletserver.Tablet] ERROR: Failed to find lastkey Seeking beyond EOF, filelen
733, wantpos 733


I suspect the last line is the cause but I do not understand what it means or how to investigate
this further. Of course I can provide any additional information you need but I didn't want
to make this post even longer at first.

Help is appreciated.

Thanks!
Keys

p.s. As a sanity test I installed Accumulo a 2nd time on a MapR M5 VM with one CPU and the
same tests fail.
________________________________
Keys Botzum
Senior Principal Technologist
WW Systems Engineering
kbotzum@maprtech.com
443-718-0098
MapR Technologies
http://www.mapr.com




Mime
View raw message