harmony-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Hindess <mark.hind...@googlemail.com>
Subject Re: [classlib][luni][performance] IdentityHashMap implementation
Date Tue, 22 Apr 2008 21:03:16 GMT

On 22 April 2008 at 23:02, "Aleksey Shipilev" <aleksey.shipilev@gmail.com> 
wrote:
>
> Good comments, Mark, thanks! But I have to chill your fury a little :)

I'm might be a little bothered but I'm certainly not furious.  I just
wish to point out issues so that we can make things better.  I'd not
want to see the patch applied in its current form but I've already
moved on from the "clean" issue - by proving to myself that it could be
cleaned up - and I'm looking at the "non-breaking" part now.

> The reason behind all that changes is that entire IdentityHashMap
> implementation was thrown away and replaced by HashMap, then the
> relevant changes that transform it back to IdentityHashMap were done
> (changing hashCode() to System.identityHashCode(), equals to ==, etc).

sigh... this should really have been made clear in the JIRA - see below
for an example of how this could be done.

> So if you diff the HashMap.java and IdentityHashMap.java after the
> patch applied, you should notice very small difference. That's a
> good sign because thus we can easily identify and generalize the
> common code in both implementations, as we suggested to do. That's the
> generalization we talked about with Nathan and Tim, right?

Well, I've not looked at HashMap.java but I suspect that it might have
methods ordered alphabetically too which is horrible and should be
fixed.

Also, when I reverted the spurious javadoc changes back to the
original form, I replaced a number of instances of the specific word
IdentityHashMap with the less specific word Map.  Given the context
doesn't detract from the meaning and does make it easier to read.
Rather than make the IdentityHashMap javadoc more like the HashMap
javadoc I suggest we change the HashMap javadoc in this way too.  (This
is a personal preference but it would make the significant differences
between HashMap.java and IdentityHashMap.java much more obvious.)

In short, making them similar is great but lets make them both better
rather than make IdentityHashMap worse.

> Keeping this in mind, you can see that the assumption of "minimal
> change" can't be preserved

Nonsense.  I have a 33% smaller cleaner patch created in ~10 minutes
using emacs and ediff mode to prove it.  And armed with the extra
information that was missing from the JIRA I can suggest an even better
way see below.

> and reading out the patch itself is pretty useless.

I disagree; I read it and I don't think it was a useless exercise.  If
my reading of the patch resulted in what you referred to above as "good
comments" then clearly it wasn't useless.

Of course, it is not the whole story but then I was never under the
illusion that it was and I am moving on to look at the result of the
patch next.  Personally, I think people underestimate the value of
reading raw patches.  You do get to see just the changes - in a clean
patch anyway - and "bad smells"[0] are often easier to see this way.

> Rather, it makes sense to apply the patch to local copy and then
> see whether the differences between IHM and HM are enough, whether
> resulting IHM implementation contradicts with spec, whether IHM passes
> the tests and runs the workloads well and stuff.

I'm getting to that part.  You stated that the patch was "clean,
non-breaking, consistent implementation, ready to be committed" and I
was testing those assertions in that order.  Although I've probably made
my mind up about the last one too.

> Of course, to avoid these confusions I can rewrite entire Collections
> at once and then contribute a mega-patch, but I don't think this is
> the proper way of doing things.

That's an interesting goal but I don't believe mega-patches are required
to achieve it.  Before I became a committer, I did a huge amount of
work unifying the native code and I did it in *lots* of small steps.
Each step was clear.  I sometimes submitted a series of patches (and/or
scripts) for a single JIRA to make the steps as transparent as possible.
For example, for HARMONY-5771, you might do something like:

  1) Attached a script (or documented) doing:

    svn copy modules/luni/src/main/java/java/util/HashMap.java \
    modules/luni/src/main/java/java/util/IdentityHashMap.java

  2) Attach a patch to be applied after the copy with the changes.

I think what you had done would be much clearer and I might not have
started this discussion.  Certainly reading the patch would have been
much easier and the *important* changes that really need reviewing would
have been obvious not cluttered like the current patch.[1]

Having said that, I think that would still be the wrong thing to do in
this case and you'd need:

  0) Attach a patch to fix some ugly things about HashMap

to avoid duplicating the IMHO less wonderful aspects of HashMap in
IdentityHashMap.

> I prefer gradual updates instead.

Well, at least, we can agree about that. ;-)

-Mark.

[0] http://sis36.berkeley.edu/projects/streek/agile/bad-smells-in-code.html

[1] The svn diff on the commits list would still be a mess but hopefully
the committer would use a log message that made it clear there was a
more readable version in the JIRA.

> On Tue, Apr 22, 2008 at 9:02 PM, Mark Hindess
> <mark.hindess@googlemail.com> wrote:
> >
> >  On 22 April 2008 at 15:23, "Aleksey Shipilev" <aleksey.shipilev@gmail.com>
> >  wrote:
> >
> > > Tim,
> >  >
> >  > I support your proposal and try to make distinction between
> >  > proof-of-concept and clean patches.
> >  >
> >  > Performance improvements could be notorious work, requiring a lot of
> >  > resources. The bad thing is, they could give nothing after the work is
> >  > done, that's why we are doing thought experiments and proof-of-concept
> >  > patches before actually implementing something. Proof-of-concept may
> >  > also give a hint on what maximum improvement we can get and set the
> >  > expectations for clean patch.
> >  >
> >  > So we are not looking for hiding the problems, rather we're exploring
> >  > what could it worth.
> >  >
> >  > F.ex. the experiments on IdentityHashMap performance finally led to
> >  > clean, non-breaking, consistent implementation, ready to be committed
> >  > patch [1]. Can we focus on reviewing it?
> >
> >  "clean [snip] patch"?  Okay, I'll take the bait...
> >
> >  1) It is good to fix the assert in the test case, but if you are going to
> >  change it at all you might as well replace:
> >
> >   assertTrue("TestB. removed entries incorrectly", map.size() == 11 &&
> >  !result);
> >
> >  with:
> >
> >   assertEquals("TestB. removed entries incorrectly, size is not correct",
> >                11, map.size());
> >   assertFalse("TestB. removed entries incorrectly, result is not correct",
> >               result);
> >
> >
> >  2) I think the IdentityHashMap javadoc comment change removes too much
> >  important information.
> >
> >
> >  3) I don't understand why you make arbitrary whitespace changes - such
> >  as rewriting:
> >
> >  public class IdentityHashMap<K, V> extends AbstractMap<K, V> implements
> >         Map<K, V>, Serializable, Cloneable {
> >
> >  to be longer than 80-characters.
> >
> >
> >  4) You rename DEFAULT_MAX_SIZE to DEFAULT_SIZE but I don't really see
> >  the relevance of this to the changes.
> >
> >
> >  5) You seem to remove a number of correct comments from the
> >  declarations of threshold, DEFAULT_MAX_SIZE, loadFactor, and modCount.
> >
> >  I almost I gave up reading the patch at this point out of frustration
> >  at the number of irrelevant changes.  Such as:
> >
> >  a) the re-ordering of often unchanged public methods - previously the
> >  public methods were grouped by similar functionality which seems logical
> >  to me but now they are in alphabetical order.  This makes the patch much
> >  bigger and harder to read.  If there is a good reason to re-order the
> >  methods, do it in a separate patch.
> >
> >
> >  b) The arbitrary changes to javadoc like:
> >
> >      /**
> >  -     * Searches this Map for the specified value.
> >  -     *
> >  -     *
> >  +     * Searches this IdentityHashMap for the specified value.
> >  +     *
> >       * @param value
> >       *            the object to search for
> >  -     * @return true if <code>value</code> is a value of this Map,
false
> >  +     * @return true if <code>value</code> is a value of this IdentityHash
> Map,
> >  false
> >       *         otherwise
> >       */
> >
> >  c) The arbitrary renaming of IdentityHashMapEntry to Entry.
> >
> >  Is it unrealistic to expect clean patches with justifications for
> >  changes?  Can we optimise patches as well as code?
> >
> >  If I apply the patch and then revert the whitespace changes,
> >  javadoc reformatting, renaming of DEFAULT_MAX_SIZE, renaming of
> >  IdentityHashMapEntry, etc. the the resulting svn diff contains at least
> >  33% fewer +/- lines than the one on the JIRA and is still functionally
> >  the same.
> >
> >  As someone who tries to read svn commit messages (and all committers or
> >  would-be-committers should be doing this to some extent!) patches like
> >  this are a nightmare.
> >
> >  Regards,
> >  -Mark.
> >
> >
> >
> >  >
> >  > Thanks,
> >  > Aleksey.
> >  >
> >  > [1] https://issues.apache.org/jira/browse/HARMONY-5771
> >  >
> >  > On Tue, Apr 22, 2008 at 12:09 PM, Tim Ellison <t.p.ellison@gmail.com>
wr
> ote:
> >  > > Aleksey Shipilev wrote:
> >  > >
> >  > > > Right, Tim. It was just the proof-of-concept "Does IdentityHashMap
> >  > > > tuning worth it?".
> >  > > >
> >  > >
> >  > >  I think it is unhelpful to mix actual, concrete suggestions for perfo
> rmanc
> >  > e
> >  > > improvements, with proof-of-concept ideas that break the specification
>  or
> >  > > security.
> >  > >
> >  > >  Please can people mark clearly in their mail whether they are suggest
> ing
> >  > > the idea only as "a thought experiment", otherwise somebody may apply

> the
> >  > > change without properly understanding the consequences.
> >  > >
> >  > >  I realize the onus is on the committer to ensure it is safe but hidin
> g suc
> >  > h
> >  > > problems is, as I said, unhelpful.
> >  > >
> >  > >  Thanks,
> >  > >  Tim
> >  > >
> >  > >
> >  > >
> >  > >
> >  > >
> >  > > > On Mon, Apr 21, 2008 at 4:50 PM, Tim Ellison <t.p.ellison@gmail.com>
> >  > > wrote:
> >  > > >
> >  > > > > Sergey Salishev wrote:
> >  > > > >
> >  > > > >
> >  > > > > > It also should be noticed that replacing the IdentityHashMap
wit
> h
> >  > > HashMap
> >  > > > > >
> >  > > > > in
> >  > > > >
> >  > > > > > Thread impl gives 4.5x boost on ThreadLocalBench.
> >  > > > > >
> >  > > > > >
> >  > > > >  But it would be wrong since it opens a security vulnerability.
> >  > > > >
> >  > > > >  See:
> >  > > > >  http://markmail.org/message/rwq3kif5n33wp65m
> >  > > > >  http://svn.apache.org/viewvc?view=rev&revision=477355
> >  > > > >
> >  > > > >
> >  > > > >  Regards,
> >  > > > >  Tim
> >  > > > >
> >  > > > >
> >  > > >
> >  > > >
> >  > >
> >  >
> >
> >
> >
> 



Mime
View raw message