Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id BBC8F200CF1 for ; Mon, 28 Aug 2017 19:26:07 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id BA62216545E; Mon, 28 Aug 2017 17:26:07 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 155C416545D for ; Mon, 28 Aug 2017 19:26:06 +0200 (CEST) Received: (qmail 19982 invoked by uid 500); 28 Aug 2017 17:26:05 -0000 Mailing-List: contact issues-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list issues@hive.apache.org Received: (qmail 19967 invoked by uid 99); 28 Aug 2017 17:26:05 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 28 Aug 2017 17:26:05 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 9FDF118531D for ; Mon, 28 Aug 2017 17:26:04 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -99.202 X-Spam-Level: X-Spam-Status: No, score=-99.202 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id dlXBQGmSQUR1 for ; Mon, 28 Aug 2017 17:26:03 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id 8309861126 for ; Mon, 28 Aug 2017 17:26:03 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id AF2A4E0A32 for ; Mon, 28 Aug 2017 17:26:01 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 14E2B2537F for ; Mon, 28 Aug 2017 17:26:00 +0000 (UTC) Date: Mon, 28 Aug 2017 17:26:00 +0000 (UTC) From: "Ratandeep Ratti (JIRA)" To: issues@hive.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (HIVE-17394) AvroSerde is regenerating TypeInfo objects for each nullable Avro field for every row MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Mon, 28 Aug 2017 17:26:07 -0000 [ https://issues.apache.org/jira/browse/HIVE-17394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ratandeep Ratti updated HIVE-17394: ----------------------------------- Attachment: AvroSerDeUnionTypeInfo.png AvroSerDe.nps > AvroSerde is regenerating TypeInfo objects for each nullable Avro field for every row > ------------------------------------------------------------------------------------- > > Key: HIVE-17394 > URL: https://issues.apache.org/jira/browse/HIVE-17394 > Project: Hive > Issue Type: Bug > Affects Versions: 1.1.0 > Reporter: Ratandeep Ratti > Attachments: AvroSerDe.nps, AvroSerDeUnionTypeInfo.png > > > The following methods in {{AvroDeserializer}} keep regenerating TypeInfo objects for every nullable field in a row. > This is happening in the following methods. > {code} > private Object deserializeNullableUnion(Object datum, Schema fileSchema, Schema recordSchema) throws AvroSerdeException { > // elided > line 312: return worker(datum, fileSchema, newRecordSchema, > SchemaToTypeInfo.generateTypeInfo(newRecordSchema, null)); > } > .. > private Object deserializeSingleItemNullableUnion(Object datum, Schema Schema recordSchema) > // elided > line 357: return worker(datum, currentFileSchema, schema, > SchemaToTypeInfo.generateTypeInfo(schema, null)); > {code} > This is really bad in terms of performance. I'm not sure why didn't we use the TypeInfo we already have instead of generating again for each nullable field. If you look at the {{worker}} method which calls the method {{deserializeNullableUnion}} the typeInfo corresponding to the nullable field column is already determined. Not sure why we have to determine that information again. > More the cache in SchmaToTypeInfo does not help in nullable Avro records case as checking if an Avro record schema object already exists in the cache requires traversing the all the fields in the record schema. > I've attached profiling snapshot which shows maximum time is being spent in the cache. > One way of fixing this IMO is to make use of the column TypeInfo which is already passed in the worker method. -- This message was sent by Atlassian JIRA (v6.4.14#64029)