accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dylan Hutchison <dhutc...@cs.washington.edu>
Subject Re: Dealing with FastBulkImportIT
Date Sat, 13 Aug 2016 22:33:30 GMT
ACCUMULO-3327 <https://issues.apache.org/jira/browse/ACCUMULO-3327> is a
perfect example of a performance bug.  The tablet servers used to reload
the bulk imported flags from the metadata table with every request.  There
is nothing wrong with the extra reloads in terms of correctness, but it
does slow the import process down.  This aspect makes it hard to test.

The JUnit category is a nice idea.  One idea to complement it is the
following procedure:

   1. Run each performance test on code *from an earlier, reference commit*.
   If a test fails, then there is a correctness problem and it should be
   treated as a failed test as usual.  If the tests all pass, write out the
   performance times to a baseline file in a special folder, maybe
   <accumuilo_src_dir>/bench.
   2. Run each performance test again, now on the new commit you want to
   test.  Compare runtimes.  If a runtime for some test increased
   "significantly" (say >10%; per-test user-configurable by annotation), then
   flag that to the user.  Maybe treat that as a failure.
   3. The output timings from these tests should be a human-readable report.

I bet there are frameworks out there for this kind of thing.  They might
have some out-of-the-box functions like warming up code by running it once
before timing it.  But it may be easier to whip up a simple solution using
JUnit.

Also: we might embrace our friends in Apache HTrace
<https://htrace.incubator.apache.org/>.  HTrace makes it simple to time and
log specific spans of code.  We could create a SpanReceiver to gather times
we are interested in for the report above.


On Sat, Aug 13, 2016 at 1:48 PM, Josh Elser <josh.elser@gmail.com> wrote:

> You're completely right. The separation of performance tests and
> correctness tests is one path forward. I think my only concern there is
> that, in our past, these tests tend to be ignored and die.
>
> I think the rest this is in the normal bucket of ITs is just because we
> don't have rigor in your 4th point about perf evaluations.
>
> Maybe, we could make some junit category to annotate such tests and make
> them runnable via Maven, removing them from normal execution. I think that
> would be an acceptable way forward.
>
> However, that would leave us with no end-to-end test for ACCUMULO-3327
> which isn't great..
>
>
> Dylan Hutchison wrote:
>
>> Hi Josh,
>>
>> Forgive me for the design question, but shouldn't we distinguish tests of
>> correctness from tests of performance? The following is my understanding
>> of
>> test categories, which does not totally align with Accumulo's test suite:
>>
>> * Unit tests test individual components.
>> * Integration tests test using components together. They may require more
>> resources such as starting an Accumulo (MAC or real).
>> * Examples are executable code separate from the above, that an outside
>> developer or user can read to see how Accumulo is used. Examples have
>> their
>> own tests.
>> * Performance evaluations are executable code separate from the above.
>> They
>> range in complexity from simple "test bulk imports" to RabdomWalk with
>> agitation.
>>
>> If performance evaluations run separately, then developers can treat then
>> like benchmarks, comparing times to those on similar hardware or across
>> commits.
>>
>> Could you remind me of the reasons why we keep performance tests in the
>> standard set of ITs?
>>
>> On Aug 13, 2016 1:03 PM, "Josh Elser"<josh.elser@gmail.com>  wrote:
>>
>> I had assumed this test would pass locally (early-2013 MBP, 2.7 GHz Intel
>>> Core i7, 16G ram), but nope! 38s and 45+ seconds on two runs.
>>>
>>> Josh Elser wrote:
>>>
>>> Hi,
>>>>
>>>> I have some complaints about FastBulkImportIT (a test added with
>>>> https://issues.apache.org/jira/browse/ACCUMULO-3327) but no good ideas
>>>> for how to better test it. As it presently stands, it is a very
>>>> subjective test WRT the kind of hardware used to run it.
>>>>
>>>> The test launches a 3-tserver MAC instance, creates about 585 splits on
>>>> a table, creates 100 files with ~1200 key-value pairs, and then waits
>>>> for the table to be balanced.
>>>>
>>>> At this point, it imports these files into that table and fails if that
>>>> takes longer than 30s.
>>>>
>>>> On my VPS (3core, 6G ram, "SSD"), the bulk import takes ~45 seconds.
>>>> This test will never pass on this node which bothers me because I am of
>>>> the opinion that anyone (with reasonable hardware) should be able to run
>>>> our tests (and to make sure it's clear: I believe this is reasonable
>>>> hardware).
>>>>
>>>> Does anyone have any thoughts on how we could stabilize this test for
>>>> developers?
>>>>
>>>> - Josh
>>>>
>>>>
>>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message