hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ryan Ausanka-Crues <r...@palominolabs.com>
Subject Re: Performance Testing
Date Thu, 21 Jun 2012 23:29:31 GMT
Other ideas that have been thrown around:
- Compile a donated collection of real world datasets that can be used in tests
- Ability to replay WALs: https://issues.apache.org/jira/browse/HBASE-6218
- Find someone to donate a cluster of machines the tests can be run on to ensure consistency
- Integrate with iTest/Bigtop

---
Ryan Ausanka-Crues
CEO
Palomino Labs, Inc.
ryan@palominolabs.com
(m) 805.242.2486

On Jun 21, 2012, at 3:05 PM, Matt Corgan wrote:

> just brainstorming =)
> 
> Some of those are motivated by the performance tests i wrote for data block
> encoding: Link<https://github.com/hotpads/hbase-prefix-trie/tree/master/test/org/apache/hadoop/hbase/cell/pt/test/performance/seek>.
> In that directory:
> 
> * SeekBenchmarkMain gathers all of the test parameters.  Perhaps we could
> have a test configuration input file format where standard test configs are
> put in source control
> * For each combination of input parameters it runs a SingleSeekBenchmark
> * As it runs, the SingleSeekBenchmark adds results to a SeekBenchmarkResult
> * Each SeekBenchmarkResult is logged after each SingleSeekBenchmark, and
> all of them are logged again at the end for pasting into a spreadsheet
> 
> They're probably too customized to my use case, but maybe we can draw ideas
> from the structure/workflow and make it applicable to more use cases.
> 
> 
> On Thu, Jun 21, 2012 at 2:47 PM, Andrew Purtell <apurtell@apache.org> wrote:
> 
>> Concur. That's ambitious!
>> 
>> On Thu, Jun 21, 2012 at 1:57 PM, Ryan Ausanka-Crues
>> <ryan@palominolabs.com> wrote:
>>> Thanks Matt. These are great!
>>> ---
>>> Ryan Ausanka-Crues
>>> CEO
>>> Palomino Labs, Inc.
>>> ryan@palominolabs.com
>>> (m) 805.242.2486
>>> 
>>> On Jun 21, 2012, at 12:36 PM, Matt Corgan wrote:
>>> 
>>>> These are geared more towards development than regression testing, but
>> here
>>>> are a few ideas that I would find useful:
>>>> 
>>>> * Ability to run the performance tests (or at least a subset of them)
>> on a
>>>> development machine would help people avoid committing regressions and
>>>> would speed development in general
>>>> * Ability to test a single region without heavier weight servers and
>>>> clusters
>>>> * Letting the test run with multiple combinations of input parameters
>>>> (block size, compression, blooms, encoding, flush size, etc, etc).
>>>> Possibly many combinations that could take a while to run
>>>> * Output results to a CSV file that's importable to a spreadsheet for
>>>> sorting/filtering/charting.
>>>> * Email the CSV file to the user notifying them the tests have finished.
>>>> * Getting fancier: ability to specify a list of branches or tags from
>> git
>>>> or subversion as inputs, which would allow the developer to tag many
>>>> different performance changes and later figure out which combination is
>> the
>>>> best (all before submitting a patch)
>>>> 
>>>> 
>>>> On Thu, Jun 21, 2012 at 12:13 PM, Elliott Clark <eclark@stumbleupon.com
>>> wrote:
>>>> 
>>>>> I actually think that more measurements are needed than just per
>> release.
>>>>> The best I could hope for would be a four node+ cluster(One master and
>>>>> three slaves) that for every check in on trunk run multiple different
>> perf
>>>>> tests.
>>>>> 
>>>>> 
>>>>> - All Reads (Scans)
>>>>> - Large Writes (Should test compactions/flushes)
>>>>> - Read Dominated with 10% writes
>>>>> 
>>>>> Then every checkin can be evaluated and large regressions can be
>> treated as
>>>>> bugs.  And with that we can see the difference between the different
>>>>> versions as well. http://arewefastyet.com/ is kind of the model that
I
>>>>> would love to see.  And I'm more than willing to help where ever
>> needed.
>>>>> 
>>>>> However in reality every night will probably be more feasible.   And
>> Four
>>>>> nodes is probably not going to happen either.
>>>>> 
>>>>> On Thu, Jun 21, 2012 at 11:38 AM, Andrew Purtell <apurtell@apache.org
>>>>>> wrote:
>>>>> 
>>>>>> On Wed, Jun 20, 2012 at 10:37 PM, Ryan Ausanka-Crues
>>>>>> <ryan@palominolabs.com> wrote:
>>>>>>> I think it makes sense to start by defining the goals for the
>>>>>> performance testing project and then deciding what we'd like to
>>>>> accomplish.
>>>>>> As such, I start by soliciting ideas from everyone on what they would
>>>>> like
>>>>>> to see from the project. We can then collate those thoughts and
>>>>> prioritize
>>>>>> the different features. Does that sound like a reasonable approach?
>>>>>> 
>>>>>> In terms of defining a goal, the fundamental need I see for us as
a
>>>>>> project is to quantify performance from one release to the next,
thus
>>>>>> be able to avoid regressions by noting adverse changes in release
>>>>>> candidates.
>>>>>> 
>>>>>> In terms of defining what "performance" means... well, that's an
>>>>>> involved and separate discussion I think.
>>>>>> 
>>>>>> Best regards,
>>>>>> 
>>>>>>  - Andy
>>>>>> 
>>>>>> Problems worthy of attack prove their worth by hitting back. - Piet
>>>>>> Hein (via Tom White)
>>>>>> 
>>>>> 
>>> 
>> 
>> 
>> 
>> --
>> Best regards,
>> 
>>   - Andy
>> 
>> Problems worthy of attack prove their worth by hitting back. - Piet
>> Hein (via Tom White)
>> 


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message