commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gary Gregory <garydgreg...@gmail.com>
Subject Re: [Math] How fast is fast enough?
Date Sat, 06 Feb 2016 02:38:26 GMT
On Fri, Feb 5, 2016 at 6:32 PM, Gilles <gilles@harfang.homelinux.org> wrote:

> On Sat, 6 Feb 2016 01:53:54 +0000, Niall Pemberton wrote:
>
>> Are you not concerned about forming a TLP of 7 around Math when one of the
>> seven is clearly not a happy camper?
>>
>
> Of course I am.
> Besides stopping to annoy non-Math Commons developers, I've asked
> that we clarify the positive objectives of the move, as far as
> development would be concerned.
> No one except Thomas seemed interested.
>
> Gilles
>
>
>> Niall
>>
>> On Sat, Feb 6, 2016 at 12:07 AM, Phil Steitz <phil.steitz@gmail.com>
>> wrote:
>>
>> On 2/5/16 12:59 PM, Gilles wrote:
>>> > On Fri, 5 Feb 2016 06:50:10 -0700, Phil Steitz wrote:
>>> >> On 2/4/16 3:59 PM, Gilles wrote:
>>> >>> Hi.
>>> >>>
>>> >>> Here is a micro-benchmark report (performed with "PerfTestUtils"):
>>> >>> -----
>>> >>> nextInt() (calls per timed block: 2000000, timed blocks: 100, time
>>> >>> unit: ms)
>>> >>>                         name time/call std dev total time ratio
>>> >>> cv difference
>>> >>> o.a.c.m.r.JDKRandomGenerator 1.088e-05 2.8e-06 2.1761e+03 1.000
>>> >>> 0.26 0.0000e+00
>>> >>>    o.a.c.m.r.MersenneTwister 1.024e-05 1.5e-06 2.0471e+03 0.941
>>> >>> 0.15 -1.2900e+02
>>> >>>           o.a.c.m.r.Well512a 1.193e-05 4.4e-07 2.3864e+03 1.097
>>> >>> 0.04 2.1032e+02
>>> >>>          o.a.c.m.r.Well1024a 1.348e-05 1.9e-06 2.6955e+03 1.239
>>> >>> 0.14 5.1945e+02
>>> >>>         o.a.c.m.r.Well19937a 1.495e-05 2.1e-06 2.9906e+03 1.374
>>> >>> 0.14 8.1451e+02
>>> >>>         o.a.c.m.r.Well19937c 1.577e-05 8.8e-07 3.1542e+03 1.450
>>> >>> 0.06 9.7816e+02
>>> >>>         o.a.c.m.r.Well44497a 1.918e-05 1.4e-06 3.8363e+03 1.763
>>> >>> 0.08 1.6602e+03
>>> >>>         o.a.c.m.r.Well44497b 1.953e-05 2.8e-06 3.9062e+03 1.795
>>> >>> 0.14 1.7301e+03
>>> >>>        o.a.c.m.r.ISAACRandom 1.169e-05 1.9e-06 2.3375e+03 1.074
>>> >>> 0.16 1.6139e+02
>>> >>> -----
>>> >>> where "cv" is the ratio of the 3rd to the 2nd column.
>>> >>>
>>> >>> Questions are:
>>> >>> * How meaningful are micro-benchmarks when the timed operation has
>>> >>> a very
>>> >>>   small duration (wrt e.g. the duration of other machine
>>> >>> instructions that
>>> >>>   are required to perform them)?
>>> >>
>>> >> It is harder to get good benchmarks for shorter duration activities,
>>> >> but not impossible.  One thing that it would be good to do is to
>>> >> compare these results with JMH [1].
>>>
>>
We use JMH over at Log4j and it does a fine job. See the log4j-perf module.


Gary

> >
>>> > I was expecting insights based on the benchmark which I did run.
>>>
>>> You asked whether or not benchmarks are meaningful when the task
>>> being benchmarked is short duration.  I answered that question.
>>> >
>>> > We have a tool in CM; if it's wrong, we should remove it.
>>> > How its results compare with JMH is an interesting question,
>>>
>>> I will look into this.
>>> > I
>>> > agree, but I don't have time to make an analysis of benchmarking
>>> > tools (on top of what I've been doing since December because
>>> > totally innocuous changes in the RNG classes were frowned upon
>>> > out of baseless fear).
>>>
>>> Please cut the hypberbole.
>>> >
>>> >>> * In a given environment (HW, OS, JVM), is there a lower limit
>>> >>> (absolute
>>> >>>   duration) below which anything will be deemed good enough?
>>> >>
>>> >> That depends completely on the application.
>>> >
>>> > Sorry, I thought that it was obvious: I don't speak of applications
>>> > that don't care about performance. :-)
>>> >
>>> > For those that do, I do not agree with the statement: the question
>>> > relates to finding a point below which it is the environment that
>>> > overwhelms the other conditions.
>>> > A point where there will be _unavoidable_ overhead (transferring data
>>> > from/to memory, JVM book-keeping, ...) and perturbations (context
>>> > switches, ...) such that their duration adds a constant time (on
>>> > average) that may render most enhancements to an already efficient
>>> > algorithm barely noticeable in practice.
>>> > Similarly, but in the opposite direction, some language constructs
>>> > or design choices might slow down things a bit, but without
>>> > endangering any user.
>>> >
>>> > A problem arises when any enhancement to the design is deemed
>>> > harmful because it degrades a micro-benchmark, even though that
>>> > benchmark may not reflect any real use-cases.
>>> > Then, the real harm is against development.
>>> >
>>> >>> * Can a library like CM admit a trade-off between ultimate
>>> >>> performance and
>>> >>>   good design?   IOW, is there an acceptable overhead in exchange
>>> >>> for other qualities
>>> >>>   (clarity, non-redundancy, extensibility, etc.)?
>>> >>
>>> >> That is too general a question to be meaningful.   We need to look
>>> >> at specific cases.  What exactly are you proposing?
>>> >
>>> > <rant>
>>> > It is quite meaningful even if it refers to general principles.
>>> > Those could (should, IMO) be taken into account when managing a
>>> > project like CM, on a par with "performance" (whose intrinsic value
>>> > is never questioned).
>>> > </rant>
>>>
>>> Rant all you want.  Vague generalities and hyperbole have no value.
>>> >
>>> > Two specific cases are:
>>> > * inheritance vs delegation (a.k.a. composition)
>>> > * generics (that could require runtime casts)
>>>
>>> This is getting closer to meaningful.  Where exactly in the code are
>>> you wanting to use something and seeing benchmark damage?
>>> >
>>> >>> * Does ultimate performance for the base functionality (generation
>>> >>> of a
>>> >>>   random number) trump any consideration of use-cases that would
>>> >>> need an
>>> >>>   extension (of the base functionality, such as computation to
>>> >>> match another
>>> >>>   distribution) that will unavoidably degrades the performance
>>> >>> (hence the
>>> >>>   micro-benchmark will be completely misleading for those users)?
>>> >>
>>> >> Again, this is vague and the answer depends on what exactly you are
>>> >> talking about. Significantly damaging performance of PRNG
>>> >> implementations is a bad idea,
>>> >
>>> > Now, *this* is vague: what do you mean by "significantly"?
>>> > That was actually my question in the first place.
>>> If you are talking about PRNG performance, I would say a 1% hit is
>>> significant.
>>> > Referring to the
>>> > benchmark above, people who'd know why they require ultimate
>>> > performance
>>> > should be able to tell what range of numbers they'd find
>>> > acceptable in
>>> > that table.
>>> >
>>> > <rant>
>>> > Actually my questions are very precise, but the answers would require
>>> > some decent analysis, rather than the usual "bad idea" dismissal.
>>> > </rant>
>>> >
>>> > In the Javadoc of the "random" package, there is information about
>>> > performance but no reference as to the benchmarking procedure.
>>>
>>> It would be great to repeat these using JMH, which is emerging as a
>>> de facto standard for java benchmarking.  I will look into this.
>>> >
>>> > I can consistently observe a totally different behaviour (using
>>> > "PerfTestUtils"):
>>> >  1. "MersenneTwister" is *always* faster than all of the WELL RNGs;
>>> >  2. moreover, the ratio *grows* with each of the longer periods
>>> >     members of the WELL family (see the above table).
>>> >
>>> > This makes me wonder how someone who purports to need "ultimate"
>>> > performance can have any objective basis to determine what is good
>>> > or bad for his own applications.
>>> >
>>> >> unless there are actual practical use
>>> >> cases you can point to that whatever changes you are proposing
>>> >> enable.
>>> >
>>> > As I've explained in very much details in another thread, I've
>>> > reviewed (from a design POV) the RNG code in "random" and IMHO, there
>>> > is room for improvement (cf. above for what I mean by that term).
>>> > <rant>
>>> > I have some code ready for review but I had to resort to what I
>>> > considered sub-optimal design (preemptively renouncing to propose a
>>> > "delegation"-based design) solely because of the destructive
>>> > community
>>> > process that takes place here.[1]
>>> > </rant>
>>>
>>> More vague hyperbole that serves no purpose.  Please focus on actual
>>> code or design issues.
>>> >
>>> > The practical use-cases is anything that needs further processing of
>>> > the numbers produced according to a uniform distribution:
>>>
>>> Isn't that what the samplers in the distributions package do?  What
>>> we need from the PRNG implementations is just blocks of bits.  Since
>>> we wanted a pluggable replacement for j.u.Random, we added uniform
>>> ints, longs and floats and gaussian floats.  The samplers just need
>>> uniform doubles.  The practical use case we need is well-supported
>>> in the code we have.  What is missing, exactly?
>>> > I agree that
>>> > there would be little sense to code that latter part in a "pure" OO
>>> > way[2].  And Luc made it indeed quite efficient, I think, in the
>>> > various
>>> > concrete classes.
>>> > What I want to reconsider is how those concrete low-level
>>> > algorithms can
>>> > be plugged in a higher-level function that just requires a "source of
>>> > randomness", as I'd call a provider of "int" (or "long") values,
>>> > where
>>> > the high level functionality does not care at all about the
>>> > provider's
>>> > inner working (a.o. how it's seeded!).
>>>
>>> This is why many higher-level samplers and other things that require
>>> random data inside [math] have a pluggable RandomGenerator.
>>> >
>>> > A case in point is the sampling of other distributions (namely the
>>> > Normal distribution).
>>>
>>> Or any of the others.  We have a default, inversion-based method
>>> that the abstract distribution classes provide and some pretty good
>>> specialized implementations within individual distributions.  Most
>>> of these just require uniform random doubles as source.
>>>
>>> >
>>> > Here is the benchmark report:
>>> > -----
>>> > nextGaussian() (calls per timed block: 2000000, timed blocks: 100,
>>> > time unit: ms)
>>> >                         name time/call std dev total time ratio
>>> > cv difference
>>> > o.a.c.m.r.JDKRandomGenerator 1.200e-05 1.7e-06 2.4001e+03 1.000
>>> > 0.14 0.0000e+00
>>> > o.a.c.m.r.JDKRandomGenerator 7.646e-05 5.1e-06 1.5292e+04 6.371
>>> > 0.07 1.2892e+04
>>> >    o.a.c.m.r.MersenneTwister 6.396e-05 3.6e-06 1.2793e+04 5.330
>>> > 0.06 1.0393e+04
>>> >           o.a.c.m.r.Well512a 6.880e-05 5.0e-06 1.3760e+04 5.733
>>> > 0.07 1.1360e+04
>>> >          o.a.c.m.r.Well1024a 6.956e-05 3.0e-06 1.3913e+04 5.797
>>> > 0.04 1.1513e+04
>>> >         o.a.c.m.r.Well19937a 7.262e-05 2.0e-06 1.4525e+04 6.052
>>> > 0.03 1.2125e+04
>>> >         o.a.c.m.r.Well19937c 7.164e-05 4.3e-06 1.4329e+04 5.970
>>> > 0.06 1.1928e+04
>>> >         o.a.c.m.r.Well44497a 8.166e-05 3.2e-06 1.6332e+04 6.804
>>> > 0.04 1.3931e+04
>>> >         o.a.c.m.r.Well44497b 8.259e-05 4.6e-06 1.6518e+04 6.882
>>> > 0.06 1.4118e+04
>>> >        o.a.c.m.r.ISAACRandom 6.724e-05 5.4e-06 1.3449e+04 5.603
>>> > 0.08 1.1049e+04
>>> > -----
>>> > where the first line is JDK's "nextInt()" and the remaining are
>>> > "nextGaussian()".
>>> >
>>> > The generation time is thus about 4-fold that of "nextInt()".
>>> > Thus, degrading the performance of "nextInt()" by 10% would
>>> > degrade the
>>> > performance of "nextGaussian()" by half that.
>>> >
>>> > For a performance discussion to be meaningful, I think that we'd need
>>> > to know how that fact would affect, even modestly, any moderately
>>> > complex
>>> > post-processing of the generated values.
>>> >
>>> > Another case, for modularity, would be to consider that other
>>> > algorithms could
>>> > be implemented to provide the required distribution.[3]
>>> > In the current design (inheritance-based), that can only be done
>>> > by creating
>>> > a subclass, even though the core functionality ("nextDouble()") is
>>> > not
>>> > overridden.
>>> >
>>> >>> * What are usages of the CM RNGs?
>>> >>>   Do those use-cases strictly forbid "loosing" a dozen
>>> >>> milliseconds per
>>> >>>   million calls?
>>> >>
>>> >> There are many different use cases.  My own applications use them in
>>> >> simulations to generate random deviates, to generate random hex
>>> >> strings as identifiers and in stochastic algorithms like some of our
>>> >> internal uses.  The last case is definitely sensitive to PRNG
>>> >> performance.
>>> >
>>> > Thanks for giving examples, but since we talk about performance, I
>>> > was hoping for some real flesh, like the relative duration of numbers
>>> > generation (e.g. the total duration of calls to the "RandomGenerator"
>>> > instances wrt to the total duration of the application).
>>> >
>>> > I don't know if by "last case", you are referring to code that is
>>> > inside CM.  I didn't spot anything that makes "heavy" usage of a
>>> > RNG (in the sense that generation would count as a sizable part of
>>> > the whole processing).
>>> monteCarloP in KolmogorovSmirnovTest is one to check.
>>> >
>>> > As I pointed out many times: if an application is severely dependent
>>> > on the performance of RNG, the user probably will turn to specific
>>> > tools (e.g. GPUs? [4]) rather than use CM.
>>>
>>> That is a bogus argument.  We should make our PRNGs simple and fast
>>> so their use can extend to performance-sensitive applications.
>>> >
>>> > Conversely, using Java might be preferred for its flexibility, which
>>> > is destroyed by a search for ultimate performance (which nobody seems
>>> > able to define reasonably).
>>> > Performance is not a goal in itself; it should not be a trophy which
>>> > sits uselessly on a shelf.
>>>
>>> Nor should "beautiful design" in the eyes of one person.
>>> >
>>> > My goal is not to deliberately slow things down; it is to allow some
>>> > leeway so that designs which are deemed better (on all counts except,
>>> > perhaps, performance) are given a chance to show their strengths, in
>>> > particular in areas where performance in absolute terms is "good
>>> > enough" for all use-cases which CM should care about (hence the need
>>> > of actual data points[5]).
>>>
>>> I see no reason that we can't have it both ways - good design and
>>> good performance. What we have now, modulo maybe some small changes
>>> to reduce code duplication, works fine.  If you want to play with
>>> 64-bit generators and can find reference implementations and verify
>>> that they do in fact perform better, great.  If not, I don't see the
>>> point.  You can rant and complain all you want; but I am not going
>>> to let us trash performance or correctness of code in the random
>>> class or anywhere else just because you think it is somehow "better
>>> designed"  unless you can show specific, practical use cases
>>> demonstrating the value of the changes.
>>>
>>> Phil
>>> >
>>> >
>>> > Gilles
>>> >
>>> > [1] "Is it faster?"
>>> >     "No."
>>> >     "Then, no."
>>> > [2] Although that is in some sense what you indirectly defend by
>>> > wanting
>>> >     to stick with a meaningless "next(int bits)" method.
>>> > [3] http://www.doornik.com/research/ziggurat.pdf
>>> > [4] http://http.developer.nvidia.com/GPUGems3/gpugems3_ch37.html
>>> > [5] Hence the need to agree on a methodology/policy for benchmarking.
>>> >
>>> >>
>>> >> Phil
>>> >>
>>> >> [1] http://openjdk.java.net/projects/code-tools/jmh/
>>> >>>   IOW, would those users for which such a difference matters use
>>> >>> CM at all?
>>> >>
>>> >>>
>>> >>>
>>> >>> Thanks,
>>> >>> Gilles
>>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>


-- 
E-Mail: garydgregory@gmail.com | ggregory@apache.org
Java Persistence with Hibernate, Second Edition
<http://www.manning.com/bauer3/>
JUnit in Action, Second Edition <http://www.manning.com/tahchiev/>
Spring Batch in Action <http://www.manning.com/templier/>
Blog: http://garygregory.wordpress.com
Home: http://garygregory.com/
Tweet! http://twitter.com/GaryGregory

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message