From commits-return-12063-archive-asf-public=cust-asf.ponee.io@hudi.apache.org Tue Feb 25 01:06:56 2020 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id 2A54C1802C7 for ; Tue, 25 Feb 2020 02:06:56 +0100 (CET) Received: (qmail 75939 invoked by uid 500); 25 Feb 2020 01:06:55 -0000 Mailing-List: contact commits-help@hudi.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hudi.apache.org Delivered-To: mailing list commits@hudi.apache.org Received: (qmail 75929 invoked by uid 99); 25 Feb 2020 01:06:55 -0000 Received: from ec2-52-202-80-70.compute-1.amazonaws.com (HELO gitbox.apache.org) (52.202.80.70) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 25 Feb 2020 01:06:55 +0000 From: GitBox To: commits@hudi.apache.org Subject: [GitHub] [incubator-hudi] garyli1019 commented on a change in pull request #1352: [HUDI-625] Fixing performance issues around DiskBasedMap & kryo Message-ID: <158259281543.12353.5968941813138015373.gitbox@gitbox.apache.org> References: In-Reply-To: Date: Tue, 25 Feb 2020 01:06:55 -0000 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit garyli1019 commented on a change in pull request #1352: [HUDI-625] Fixing performance issues around DiskBasedMap & kryo URL: https://github.com/apache/incubator-hudi/pull/1352#discussion_r383605610 ########## File path: hudi-common/src/main/java/org/apache/hudi/common/util/SerializationUtils.java ########## @@ -121,50 +116,16 @@ Object deserialize(byte[] objectData) { public Kryo newKryo() { - Kryo kryo = new KryoBase(); + Kryo kryo = new Kryo(); // ensure that kryo doesn't fail if classes are not registered with kryo. kryo.setRegistrationRequired(false); // This would be used for object initialization if nothing else works out. - kryo.setInstantiatorStrategy(new org.objenesis.strategy.StdInstantiatorStrategy()); + kryo.setInstantiatorStrategy(new Kryo.DefaultInstantiatorStrategy(new StdInstantiatorStrategy())); // Handle cases where we may have an odd classloader setup like with libjars // for hadoop kryo.setClassLoader(Thread.currentThread().getContextClassLoader()); return kryo; } - private static class KryoBase extends Kryo { - @Override - protected Serializer newDefaultSerializer(Class type) { - final Serializer serializer = super.newDefaultSerializer(type); - if (serializer instanceof FieldSerializer) { - final FieldSerializer fieldSerializer = (FieldSerializer) serializer; - fieldSerializer.setIgnoreSyntheticFields(true); - } - return serializer; - } - - @Override - protected ObjectInstantiator newInstantiator(Class type) { - return () -> { - // First try reflectasm - it is fastest way to instantiate an object. - try { - final ConstructorAccess access = ConstructorAccess.get(type); - return access.newInstance(); - } catch (Throwable t) { - // ignore this exception. We may want to try other way. - } - // fall back to java based instantiation. - try { - final Constructor constructor = type.getConstructor(); - constructor.setAccessible(true); - return constructor.newInstance(); - } catch (NoSuchMethodException | IllegalAccessException | InstantiationException - | InvocationTargetException e) { - // ignore this exception. we will fall back to default instantiation strategy. - } - return super.getInstantiatorStrategy().newInstantiatorOf(type).newInstance(); Review comment: What I curious about was what you mentioned in the ticket: ``` kryo.register(HoodieKey.class, new HoodieKeySerializer()); kryo.register(GenericData.Record.class, new GenericDataRecordSerializer()); kryo.register(HoodieRecord.class, new HoodieRecordSerializer()); kryo.register(HoodieRecordLocationSerializer.class, new HoodieRecordLocationSerializer()); kryo.register(OverwriteWithLatestAvroPayload.class, new OverwriteWithLatestPayloadSerializer()); ``` Where this is done? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org With regards, Apache Git Services