phoenix-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thomas D'Silva (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (PHOENIX-2938) HFile support for SparkSQL DataFrame saves
Date Fri, 03 May 2019 17:57:00 GMT

     [ https://issues.apache.org/jira/browse/PHOENIX-2938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Thomas D'Silva updated PHOENIX-2938:
------------------------------------
    Labels: spark  (was: )

> HFile support for SparkSQL DataFrame saves
> ------------------------------------------
>
>                 Key: PHOENIX-2938
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-2938
>             Project: Phoenix
>          Issue Type: Improvement
>            Reporter: Chris Tarnas
>            Assignee: Kalyan
>            Priority: Minor
>              Labels: spark
>
> Currently when saving a DataFrame in Spark it is persisted as upserts. Having an option
to do saves natively via HFiles, as the MapReduce loader does, would be a great performance
improvement for large bulk loads. The current work around to reduce the load on the regionservers
would be to save to csv from Spark then load via the MapReduce loader.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message