phoenix-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lars Hofhansl (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (PHOENIX-5410) Phoenix spark to hbase connector takes long time persist data
Date Mon, 05 Aug 2019 08:18:00 GMT

     [ https://issues.apache.org/jira/browse/PHOENIX-5410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Lars Hofhansl updated PHOENIX-5410:
-----------------------------------
    Fix Version/s:     (was: 5.1.0)
                   connectors-1.0.0

> Phoenix spark to hbase connector takes long time persist data
> -------------------------------------------------------------
>
>                 Key: PHOENIX-5410
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-5410
>             Project: Phoenix
>          Issue Type: Bug
>    Affects Versions: connectors-1.0.0
>            Reporter: Manohar Chamaraju
>            Priority: Major
>             Fix For: connectors-1.0.0
>
>         Attachments: PHOENIX-5410.patch
>
>
> While using the phoenix spark connector 1.0.0-SNAPSHOT ([https://github.com/apache/phoenix-connectors/tree/master/phoenix-spark])
 for hbase found that write was taking really long time.
> On profiling the connector found that 90% of cpu time is consumed in method SparkJdbcUtil.toRow()
method. 
> !https://files.slack.com/files-pri/T037D1PV9-FKYGD504A/image.png!
> If i look into code SparkJdbcUtil.toRow() method gets called for every field of a row
and RowEncoder(schema).resolveAndBind() object gets created for every iteration because of
this lots of encoder objects get created and collected by GC eventually causing CPU cycles
and causing performance degradation.
> Moreover SparkJdbcUtil.toRow() is called by PhoenixDataWriter.write() where schema for
writer object is same for all rows hence we can optimize the code there by avoiding creating
unnecessary objects and gaining good % of performance improvement.
>  
> By using changes in patch time required for write reduced from 30 minutes to less than
40 seconds in our test environment.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Mime
View raw message