From issues-return-52034-archive-asf-public=cust-asf.ponee.io@geode.apache.org  Thu Aug  1 17:24:03 2019
Return-Path: <issues-return-52034-archive-asf-public=cust-asf.ponee.io@geode.apache.org>
X-Original-To: archive-asf-public@cust-asf.ponee.io
Delivered-To: archive-asf-public@cust-asf.ponee.io
Received: from mail.apache.org (hermes.apache.org [207.244.88.153])
	by mx-eu-01.ponee.io (Postfix) with SMTP id 4D5D918065E
	for <archive-asf-public@cust-asf.ponee.io>; Thu,  1 Aug 2019 19:24:03 +0200 (CEST)
Received: (qmail 55826 invoked by uid 500); 1 Aug 2019 17:24:02 -0000
Mailing-List: contact issues-help@geode.apache.org; run by ezmlm
Precedence: bulk
List-Help: <mailto:issues-help@geode.apache.org>
List-Unsubscribe: <mailto:issues-unsubscribe@geode.apache.org>
List-Post: <mailto:issues@geode.apache.org>
List-Id: <issues.geode.apache.org>
Reply-To: dev@geode.apache.org
Delivered-To: mailing list issues@geode.apache.org
Received: (qmail 55803 invoked by uid 99); 1 Aug 2019 17:24:02 -0000
Received: from mailrelay1-us-west.apache.org (HELO mailrelay1-us-west.apache.org) (209.188.14.139)
    by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 01 Aug 2019 17:24:02 +0000
Received: from jira-lw-us.apache.org (unknown [207.244.88.139])
	by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 0BF21E2F9B
	for <issues@geode.apache.org>; Thu,  1 Aug 2019 17:24:02 +0000 (UTC)
Received: from jira-lw-us.apache.org (localhost [127.0.0.1])
	by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 8A05126650
	for <issues@geode.apache.org>; Thu,  1 Aug 2019 17:24:00 +0000 (UTC)
Date: Thu, 1 Aug 2019 17:24:00 +0000 (UTC)
From: "xiaojian zhou (JIRA)" <jira@apache.org>
To: issues@geode.apache.org
Message-ID: <JIRA.13245334.1563315674000.90818.1564680240563@Atlassian.JIRA>
In-Reply-To: <JIRA.13245334.1563315674000@Atlassian.JIRA>
References: <JIRA.13245334.1563315674000@Atlassian.JIRA> <JIRA.13245334.1563315674348@jira-lw-us.apache.org>
Subject: [jira] [Resolved] (GEODE-6973) getExistingIdForType should not
 compare all entries in idToType region
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394


     [ https://issues.apache.org/jira/browse/GEODE-6973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

xiaojian zhou resolved GEODE-6973.
----------------------------------
       Resolution: Fixed
    Fix Version/s: 1.10.0

> getExistingIdForType should not compare all entries in idToType region
> ----------------------------------------------------------------------
>
>                 Key: GEODE-6973
>                 URL: https://issues.apache.org/jira/browse/GEODE-6973
>             Project: Geode
>          Issue Type: Bug
>            Reporter: xiaojian zhou
>            Assignee: xiaojian zhou
>            Priority: Major
>              Labels: GeodeCommons
>             Fix For: 1.10.0
>
>          Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> We found the PeerTypeRegistration's getExistingIdForType() will iterate through the idToType region's entries to find if the incoming newType is there. 
> If idToType region contains 20K or 100K entries, this will impact the put throughput (customers did notice the performance downgrade when there're many pdxTypes). 
> To make the things worse, the comparison is to compare the whole object, field to field. If the json object (which will be converted to pdxType) contains 30 fields, the comparison will have to compare up to 30 fields. If the idToType region contains 20K entries, A new pdxType will do 20K  x 30 string comparisons before register it. 
> We found each server maintained a typeToId map, this map is used to check if the pdxType exists. If exists, it will return the type id without check the IdToType region. The total number of pdxType did not impact the put performance if the pdxTypd exists. 
> The typeToId map is maintained with a d-lock, each time we added a new pdxType, it will update into the map while still holding the d-lok. So we believe that the map should be the same as the region in content. If we cannot find the pdxType in the map, it should not be in the region. We can skip the iteration of region (which is the root cause of the performance issue). 
> Another issue in current code is: when each time a new type come, it will recreate the map. This is unnecessary and contributes to the slowness too. 
> We should only create the map during initialize(). 
> Here are the tests we want to introduce:
> 1) a junit test to prove that reorder fields in a big JSON file will not cause significant hashcode conflicts (<1%)
> 2) a junit test to prove that add a index to a field in a big JSON file will hardly cause hashcode conflicts. 
> This 2 tests are to prove that hashcode conflict is not the root cause of linear probing for PDXTypeId. 
> 3) a junit test to prove that for the cases that hashcode conflict caused by reordered fields, there will be no hashcode conflicts if using SORT_JSON_FIELD_NAMES_PROPERTY=true. 
> 4) a dunit test to prove that SORT_JSON_FIELD_NAMES_PROPERTY=true or false did not impact the performance to add a new pdxType. 
> 5) a dunit test to create a new pdxType from 2 peer server at the same time. The test is to prove that the d-lock take effect, one server create the pdxType, and another server should find the pdxType exists. 
> Do this test both from server directly and from clients. 
> 6) Create 2 different objects which ends up with the same hashcode (we can get the 2 objects from test-1), try to put the 2 objects to create new pdxType. The 2nd one should also create a new type. It should not be treated as "found an existing pdxType". 


--
This message was sent by Atlassian JIRA
(v7.6.14#76016)