geode-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "xiaojian zhou (JIRA)" <>
Subject [jira] [Resolved] (GEODE-6973) getExistingIdForType should not compare all entries in idToType region
Date Thu, 01 Aug 2019 17:24:00 GMT


xiaojian zhou resolved GEODE-6973.
       Resolution: Fixed
    Fix Version/s: 1.10.0

> getExistingIdForType should not compare all entries in idToType region
> ----------------------------------------------------------------------
>                 Key: GEODE-6973
>                 URL:
>             Project: Geode
>          Issue Type: Bug
>            Reporter: xiaojian zhou
>            Assignee: xiaojian zhou
>            Priority: Major
>              Labels: GeodeCommons
>             Fix For: 1.10.0
>          Time Spent: 3h 40m
>  Remaining Estimate: 0h
> We found the PeerTypeRegistration's getExistingIdForType() will iterate through the idToType
region's entries to find if the incoming newType is there. 
> If idToType region contains 20K or 100K entries, this will impact the put throughput
(customers did notice the performance downgrade when there're many pdxTypes). 
> To make the things worse, the comparison is to compare the whole object, field to field.
If the json object (which will be converted to pdxType) contains 30 fields, the comparison
will have to compare up to 30 fields. If the idToType region contains 20K entries, A new pdxType
will do 20K  x 30 string comparisons before register it. 
> We found each server maintained a typeToId map, this map is used to check if the pdxType
exists. If exists, it will return the type id without check the IdToType region. The total
number of pdxType did not impact the put performance if the pdxTypd exists. 
> The typeToId map is maintained with a d-lock, each time we added a new pdxType, it will
update into the map while still holding the d-lok. So we believe that the map should be the
same as the region in content. If we cannot find the pdxType in the map, it should not be
in the region. We can skip the iteration of region (which is the root cause of the performance
> Another issue in current code is: when each time a new type come, it will recreate the
map. This is unnecessary and contributes to the slowness too. 
> We should only create the map during initialize(). 
> Here are the tests we want to introduce:
> 1) a junit test to prove that reorder fields in a big JSON file will not cause significant
hashcode conflicts (<1%)
> 2) a junit test to prove that add a index to a field in a big JSON file will hardly cause
hashcode conflicts. 
> This 2 tests are to prove that hashcode conflict is not the root cause of linear probing
for PDXTypeId. 
> 3) a junit test to prove that for the cases that hashcode conflict caused by reordered
fields, there will be no hashcode conflicts if using SORT_JSON_FIELD_NAMES_PROPERTY=true.

> 4) a dunit test to prove that SORT_JSON_FIELD_NAMES_PROPERTY=true or false did not impact
the performance to add a new pdxType. 
> 5) a dunit test to create a new pdxType from 2 peer server at the same time. The test
is to prove that the d-lock take effect, one server create the pdxType, and another server
should find the pdxType exists. 
> Do this test both from server directly and from clients. 
> 6) Create 2 different objects which ends up with the same hashcode (we can get the 2
objects from test-1), try to put the 2 objects to create new pdxType. The 2nd one should also
create a new type. It should not be treated as "found an existing pdxType". 

This message was sent by Atlassian JIRA

View raw message