Mailing-List: contact dev-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@hbase.apache.org
Received-SPF: unknown ~alla (nike.apache.org: encountered unrecognized
 mechanism during SPF processing of domain of mcorgan@hotpads.com)
MIME-Version: 1.0
In-Reply-To: 
 <CAKYwJ9y3cOqx-DBkT4NHmewvqL8gd5KEzX_pTWM=WW1Q_qdVaw@mail.gmail.com>
References: <C447C6D9-8B67-499A-AC2F-3B8407F09B4B@palominolabs.com>
	<CA+RK=_DZ=uUAM5AjWb+3Y_xaLMRpnxVNzu5nokxmD=K6HDG8Mg@mail.gmail.com>
	<CAKYwJ9y3cOqx-DBkT4NHmewvqL8gd5KEzX_pTWM=WW1Q_qdVaw@mail.gmail.com>
Date: Thu, 21 Jun 2012 12:36:13 -0700
Message-ID: 
 <CAOKsKJVvz6L_YkDZBN7DdS-Vj2iQXMdgMA+wBjd3cnALEfm=7w@mail.gmail.com>
Subject: Re: Performance Testing
From: Matt Corgan <mcorgan@hotpads.com>
To: dev@hbase.apache.org
Content-Type: multipart/alternative; boundary=e89a8ff1c63cdd028f04c300a0e5

--e89a8ff1c63cdd028f04c300a0e5
Content-Type: text/plain; charset=UTF-8

These are geared more towards development than regression testing, but here
are a few ideas that I would find useful:

* Ability to run the performance tests (or at least a subset of them) on a
development machine would help people avoid committing regressions and
would speed development in general
* Ability to test a single region without heavier weight servers and
clusters
* Letting the test run with multiple combinations of input parameters
(block size, compression, blooms, encoding, flush size, etc, etc).
 Possibly many combinations that could take a while to run
* Output results to a CSV file that's importable to a spreadsheet for
sorting/filtering/charting.
* Email the CSV file to the user notifying them the tests have finished.
* Getting fancier: ability to specify a list of branches or tags from git
or subversion as inputs, which would allow the developer to tag many
different performance changes and later figure out which combination is the
best (all before submitting a patch)


On Thu, Jun 21, 2012 at 12:13 PM, Elliott Clark <eclark@stumbleupon.com>wrote:

> I actually think that more measurements are needed than just per release.
>  The best I could hope for would be a four node+ cluster(One master and
> three slaves) that for every check in on trunk run multiple different perf
> tests.
>
>
>   - All Reads (Scans)
>   - Large Writes (Should test compactions/flushes)
>   - Read Dominated with 10% writes
>
> Then every checkin can be evaluated and large regressions can be treated as
> bugs.  And with that we can see the difference between the different
> versions as well. http://arewefastyet.com/ is kind of the model that I
> would love to see.  And I'm more than willing to help where ever needed.
>
>  However in reality every night will probably be more feasible.   And Four
> nodes is probably not going to happen either.
>
> On Thu, Jun 21, 2012 at 11:38 AM, Andrew Purtell <apurtell@apache.org
> >wrote:
>
> > On Wed, Jun 20, 2012 at 10:37 PM, Ryan Ausanka-Crues
> > <ryan@palominolabs.com> wrote:
> > > I think it makes sense to start by defining the goals for the
> > performance testing project and then deciding what we'd like to
> accomplish.
> > As such, I start by soliciting ideas from everyone on what they would
> like
> > to see from the project. We can then collate those thoughts and
> prioritize
> > the different features. Does that sound like a reasonable approach?
> >
> > In terms of defining a goal, the fundamental need I see for us as a
> > project is to quantify performance from one release to the next, thus
> > be able to avoid regressions by noting adverse changes in release
> > candidates.
> >
> > In terms of defining what "performance" means... well, that's an
> > involved and separate discussion I think.
> >
> > Best regards,
> >
> >    - Andy
> >
> > Problems worthy of attack prove their worth by hitting back. - Piet
> > Hein (via Tom White)
> >
>

--e89a8ff1c63cdd028f04c300a0e5--