phoenix-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Manohar Chamaraju (JIRA)" <j...@apache.org>
Subject [jira] [Created] (PHOENIX-5410) Phoenix spark takes long time persist data to hbase
Date Wed, 24 Jul 2019 10:00:03 GMT
Manohar Chamaraju created PHOENIX-5410:
------------------------------------------

             Summary: Phoenix spark takes long time persist data to hbase
                 Key: PHOENIX-5410
                 URL: https://issues.apache.org/jira/browse/PHOENIX-5410
             Project: Phoenix
          Issue Type: Bug
            Reporter: Manohar Chamaraju


While using the phoenix spark connector 1.0.0-SNAPSHOT ([https://github.com/apache/phoenix-connectors/tree/master/phoenix-spark])
 for hbase found that write was taking really long time.

On profiling the connector found that 90% of cpu time is consumed in method SparkJdbcUtil.toRow()
method. 

!https://files.slack.com/files-pri/T037D1PV9-FKYGD504A/image.png!

If i look into code SparkJdbcUtil.toRow() method gets called for every field of a row and
RowEncoder(schema).resolveAndBind() object gets created for every iteration because of this
lots of encoder objects get created and collected by GC eventually causing CPU cycles and
causing performance degradation.

Moreover SparkJdbcUtil.toRow() is called by PhoenixDataWriter.write() where schema for writer
object is same for all rows hence we can optimize the code there by avoiding creating unnecessary
objects and gaining good % of performance improvement.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Mime
View raw message