mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <>
Subject Re: Flattened arrays of simple structures (valuetype-like classes).
Date Sat, 26 Mar 2011 20:18:00 GMT
On Sat, Mar 26, 2011 at 5:04 AM, Dawid Weiss

> > They are there to assert things that should never ever be
> > false, if the program is working, no matter what bad input or network
> faults
> > occur. Turning them off should have zero effect in a working program.
> This is an interesting point of view and I think I like it even better
> than my own, only I would change "zero effect in a working program" to
> "zero effect in a correctly written program". That is adhering to API
> contracts, etc. This was actually my understanding when I was writing
> HPPC -- that once your code passes a million unit/ integration tests
> you can trust it enough to run it without assertions in production...
> when/if something breaks, you will know anyway because of malformed
> output or other exceptions and then you can rerun with assertions on
> or add more tests.

I must be a total pessimist.

I don't believe in the existence of "correctly written programs" any more
than I believe in Santa Claus.  There may be tiny moments in time when a
program is correct, but once it becomes more than small or more than a short
period passes between the writing and the rewriting or more than one person
dips their pen, it becomes a dream rather than a reality.

I guess both viewpoints have pros and cons, so convincing anybody does
> not make much sense, but it's an interesting discussion nonetheless.

Thank you for putting this so generously.

> > I don't like the "checked" and "unchecked" getter idiom. We have such a
> > thing in Vector -- set() and setQuick(). From reading the API, well,
> who's
> > not going to choose the quick operation over the "slow" one?
> Exactly. I always found it a bit odd when I was presented two versions
> of essentially the same method... and usually went with the "riskier"
> one.

Wow.  I go the opposite way.  I always pick the safer method.  Then if I
have time in my development cycle or the profiler shows that code to be
slow, I go in and use loop invariants to prove the risky calls are safe.

Commonly, I find the risky calls save nothing, especially with modern JVM's.
 So I put back in the safe calls.

Occasionally, there is a >10% difference.  Then I put in comments and
argument testing outside the loop to try to ensure that nobody else breaks
the code.

When reviewing code, I always view the use of the unsafe versions as a bug
unless there is documentation backing up the benefit and safeguards around
the risky version.

I also try to use the terminology "safe" and "risky" rather than "slow" and
"fast".  The safe/risky terminology is always correct while the slow/fast
terminology is only occasionally correct. The setQuick nomenclature is
something we inherited from Colt and would have a hard time changing even
though it would probably be better called

> > unchecked version is I think tiny. Bad input will just result in an
> > ArrayIndexOutOfBoundsException or NullPointerException quickly anyway.
> It may or it may not, if your storage is larger than your input. I
> guess most of the problems are off-by-one errors and not
> off-by-million (or negative index) errors.

Views of arrays also make this kind of error even more hard to see without
explicit checks.

> I think a nice addition would be to allow specifying if you want to

have ifs or assertions (it's generated code anyway) so that people can
> pick what they are comfortable with. I'll add it to the TODO list --
> thanks for inspiration, Sean.


Did you generate this code using a separate code generator step?  Or using a
class loader magic thing?

Can you point us at the heart of the code generator in either case?

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message