groovy-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jochen Theodorou (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (GROOVY-7448) AbstractConcurrentMap performing rehash() on every insert
Date Tue, 02 Jun 2015 13:18:19 GMT

    [ https://issues.apache.org/jira/browse/GROOVY-7448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14569077#comment-14569077
] 

Jochen Theodorou commented on GROOVY-7448:
------------------------------------------

congrats, a good find you made there! (and a nice bug report as well)

If you want to make a pull request for this and get your name into the codebase you are very
welcome to do so. If not, then tell me and I do the change with setting c to count+1.

As for "question: why this is done, in this case the old array with repaired invalid elements
whould be sufficient ...? "
That is a good question... looking at AbstractConcurrentMapBase#rehash, I actually see two
loops that invalidate entries. The first one mutates the old table. The second loop will validate
the entries again, before copying them into the new table. Those two together don't make sense
to me. I could have said, that without mutating the old table, avoiding side effects for the
get method was the goal... though I don't see those. In my opinion, the second loop can be
skipped, if entries haven been invalidated, just as you said. But also the second loop does
not need to do a revalidation.


> AbstractConcurrentMap performing rehash() on every insert
> ---------------------------------------------------------
>
>                 Key: GROOVY-7448
>                 URL: https://issues.apache.org/jira/browse/GROOVY-7448
>             Project: Groovy
>          Issue Type: Bug
>          Components: groovy-jdk
>    Affects Versions: 2.4.3
>            Reporter: Tomas Bartalos
>            Assignee: Guillaume Laforge
>
> The problem happens in inner class org.codehaus.groovy.util.AbstractConcurrentMap.Segment.put()
method (line 105):
> When the current count is greater then map's threshold a rehash() happens. Now the tricky
part is, that the Map holds soft references to objects and rehash() validates those references.
So when GC discards soft references (e.isValid() == false), the resulting segment doesn't
expand (as it is assumed in the put() method).
> in the last line of rehash() the Segmen't internal counter is updated count = newCount
(which is the number of undiscarded (e.isValid() == true) references and can be smaller than
previous count as described above)
> after rehash() is done, the put() method continues, however the buggy part is, that it
disregards the previous setting of the internal count and it sets the previous count+1 value
in every case on lines 124, 143 and 159 of AbstractConcurrentMapBase class.
> Since the rehash() is called only from put() method, the internal counter is not synchronized
with real elements count contained in underlying array and rehash() is called on all subsequent
calls of put() method.
> The following steps are happening:
> 1) Map state: threshold = 786432; count=786432
> 2) New element is inserted into the map: count = 786433; threshold = 786432
> 3) since new count would be greater than threshold rehash() happens
> rehash() finds out, that most of the objects were garbage collected, thus it doesn't
increase the size of the Segment, however anyway it copies all the objects from one table
to another (System.arrayCopy()). (question: why this is done, in this case the old array with
repaired invalid elements whould be sufficient ...? )
> 4) rehash() sets the internal count to new value, which is smaller, because many objects
were garbage collected (soft references), lets say: count = 486 000
> 5) The put() continues, disregards the count = 486 000 and sets the count to count =
786433
> 6) When another element is inserted, the count is still greater than threshold so the
rehash happens again
> From now on every element added to map will trigger a rehash(), which has a huge performance
impact.
> When this happens in multithreaded environment, all the other threads are waiting (parked)
for lock() until the rehash() and put() is done (and then the next one is doing the rehash()
again and so on). You can imagine what performance impact this is.
> Proposed solution:
> Update the c variable after rehash is done. Between lines 105 and 106 of org.codehaus.groovy.util.AbstractConcurrentMap
 add:
> 104: if (c++ > threshold) {
> 105:                rehash();
> 106:                c = count + 1
> 107:}
> c = count + 1



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message