river-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dan Creswell <dan.cresw...@gmail.com>
Subject Re: Space/outrigger suggestions (remote iterator vs. collection)
Date Wed, 22 Dec 2010 21:02:44 GMT
So I agree with the common structure for analysis requirement - it's
essential for any form of performance and capacity stuff. However I'm also
lazy so would quite happily leave someone else to spec/build that :)

What I would say is that if I were to do such a thing I'd be tempted more by
JSON which is often a little less verbose than XML for many things and has
plenty of libraries for parsing etc in almost any language. And the other
thing I'd say is that if you're doing the work and prefer XML, you should
get ultimate choice.

I've built up a variety of benchmarks for Blitz, some of them are basic
operation exercisers (how many takes or writes can I do etc) and I use those
for simple tuning exercises. The ones I deem more important though are a
collection based on real user application behaviour. Micro benchmarks are
fine and all but don't mean so much in the real-world.

I don't believe there's a standard set of benchmarks that everyone does but
I'm happy to summarise what I've personally done if that's useful. I
wouldn't offer up the code though as it's been organically built up over
time and is great for me and my way of working but not so amenable to what
you want or indeed particularly readable. In my judgement it'd be better to
re-code from scratch...

On 22 December 2010 20:51, Patricia Shanahan <pats@acm.org> wrote:

> Ideally, I'd like to get enough common structure in the output file formats
> that I can automate comparing a new bulk run to a previous bulk run, and
> highlighting significant changes.
> For my recent dissertation research, I needed to compare results of large
> numbers of simulation runs. I found XML a reasonable compromise between
> human readability and machine processing.
> Not an immediate concern - I'd like to have the problem of having too many
> performance tests for manual handling.
> Do you know of any existing benchmarks we could use?
> Patricia
> On 12/22/2010 12:30 PM, Dan Creswell wrote:
>> I agree with the need for performance tests.
>>  From my own experience I'd say you'd want to be able to run those tests
>>> in
>> isolation but also together to get a big picture view of a change because
>> spaces being what they are, it's incredibly easy for an optimisation that
>> improves one test to cripple another.
>> On 22 December 2010 19:08, Patricia Shanahan<pats@acm.org>  wrote:
>>  On 12/22/2010 10:57 AM, jgrahn@simulexinc.com wrote:
>>> ...
>>>  This is the biggest concern, I think.   As such, I'd be interested in
>>>> seeing performance runs, to back up the intuition.   Then, at least,
>>>> we'd know precisely what trade-off we're talking about.
>>>> The test would need to cover both small batches and large, both in
>>>> multiples of the batch-size/takeMultipleLimit and for numbers off of
>>>> those multiples, with transactions and without.
>>> I think we need a lot of performance tests, some way to organize them,
>>> and
>>> some way to retain their results.
>>> I propose adding a "performance" folder to the River trunk, with
>>> subdirectories "src" and "results". src would contain benchmark source
>>> code.
>>> result would contain benchmark output.
>>> System level tests could have their own package hierarchy, under
>>> org.apache.impl, but reflecting what is being measured. Unit level tests
>>> would need to follow the package hierarchy for the code being tested, to
>>> get
>>> package access. The results hierarchy would mirror that src hierarchy for
>>> the tests.
>>> Any ideas, alternatives, changes, improvements?
>>> Patricia

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message