phoenix-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Manohar Chamaraju (JIRA)" <>
Subject [jira] [Updated] (PHOENIX-5410) Phoenix spark to hbase connector takes long time persist data
Date Wed, 24 Jul 2019 10:59:00 GMT


Manohar Chamaraju updated PHOENIX-5410:
    Attachment: PHOENIX-5410.patch

> Phoenix spark to hbase connector takes long time persist data
> -------------------------------------------------------------
>                 Key: PHOENIX-5410
>                 URL:
>             Project: Phoenix
>          Issue Type: Bug
>    Affects Versions: connectors-1.0.0
>            Reporter: Manohar Chamaraju
>            Priority: Major
>         Attachments: PHOENIX-5410.patch
> While using the phoenix spark connector 1.0.0-SNAPSHOT ([])
 for hbase found that write was taking really long time.
> On profiling the connector found that 90% of cpu time is consumed in method SparkJdbcUtil.toRow()
> !!
> If i look into code SparkJdbcUtil.toRow() method gets called for every field of a row
and RowEncoder(schema).resolveAndBind() object gets created for every iteration because of
this lots of encoder objects get created and collected by GC eventually causing CPU cycles
and causing performance degradation.
> Moreover SparkJdbcUtil.toRow() is called by PhoenixDataWriter.write() where schema for
writer object is same for all rows hence we can optimize the code there by avoiding creating
unnecessary objects and gaining good % of performance improvement.
> By using changes in patch time required for write reduced from 30 minutes to less than
second in our test environment.

This message was sent by Atlassian JIRA

View raw message