From dev-return-57699-archive-asf-public=cust-asf.ponee.io@phoenix.apache.org Fri Aug 2 10:25:02 2019 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id 06145180647 for ; Fri, 2 Aug 2019 12:25:01 +0200 (CEST) Received: (qmail 34520 invoked by uid 500); 2 Aug 2019 10:25:01 -0000 Mailing-List: contact dev-help@phoenix.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@phoenix.apache.org Delivered-To: mailing list dev@phoenix.apache.org Received: (qmail 34503 invoked by uid 99); 2 Aug 2019 10:25:01 -0000 Received: from mailrelay1-us-west.apache.org (HELO mailrelay1-us-west.apache.org) (209.188.14.139) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 02 Aug 2019 10:25:01 +0000 Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 8FCB5E2F8C for ; Fri, 2 Aug 2019 10:25:00 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 4C298213F9 for ; Fri, 2 Aug 2019 10:25:00 +0000 (UTC) Date: Fri, 2 Aug 2019 10:25:00 +0000 (UTC) From: "Manohar Chamaraju (JIRA)" To: dev@phoenix.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (PHOENIX-5410) Phoenix spark to hbase connector takes long time persist data MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/PHOENIX-5410?page=3Dcom.atlass= ian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manohar Chamaraju updated PHOENIX-5410: --------------------------------------- Attachment: (was: PHOENIX-5410.patch) > Phoenix spark to hbase connector takes long time persist data > ------------------------------------------------------------- > > Key: PHOENIX-5410 > URL: https://issues.apache.org/jira/browse/PHOENIX-5410 > Project: Phoenix > Issue Type: Bug > Affects Versions: connectors-1.0.0 > Reporter: Manohar Chamaraju > Priority: Major > Attachments: PHOENIX-5410.patch > > > While using the phoenix spark connector=C2=A01.0.0-SNAPSHOT ([https://git= hub.com/apache/phoenix-connectors/tree/master/phoenix-spark]) =C2=A0for hba= se found that write was taking really long time. > On profiling the connector found that=C2=A090% of cpu time is consumed in= method SparkJdbcUtil.toRow() method.=C2=A0 > !https://files.slack.com/files-pri/T037D1PV9-FKYGD504A/image.png! > If i look into code=C2=A0SparkJdbcUtil.toRow() method gets called for eve= ry field of a row and RowEncoder(schema).resolveAndBind() object gets creat= ed for every=C2=A0iteration because of this lots of encoder objects get cre= ated and collected by GC eventually causing CPU cycles and causing performa= nce degradation. > Moreover=C2=A0SparkJdbcUtil.toRow() is called by PhoenixDataWriter.write(= ) where schema for writer object is same for all rows hence we can optimize= the code there by avoiding creating unnecessary objects and gaining good %= of performance improvement. > =C2=A0 > By using changes in patch time required for write reduced from 30 minutes= to less than 40 seconds in our test environment. -- This message was sent by Atlassian JIRA (v7.6.14#76016)