spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Manish Kumar (JIRA)" <>
Subject [jira] [Created] (SPARK-16169) Saving Intermediate dataframe increasing processing time upto 5 times.
Date Thu, 23 Jun 2016 10:35:16 GMT
Manish Kumar created SPARK-16169:

             Summary: Saving Intermediate dataframe increasing processing time upto 5 times.
                 Key: SPARK-16169
             Project: Spark
          Issue Type: Question
          Components: Spark Submit, Web UI
    Affects Versions: 1.6.1
         Environment: Amazon EMR
            Reporter: Manish Kumar

When, a spark application written in scala trying to save intermediate dataframe, the application
is taking processing almost 5 times. 

Although the spark-UI clearly shows that all stages are completed but the spark apllication
remains in running status.

Below is the command for saving intermediate output and than using the dataframe.

saveDataFrame(flushPath, flushFormat, isCoalesce, flushMode, previousDataFrame, sqlContext)

here, previousDataFrame is the result of the last step and saveDataFrame is just saving the
DataFrame as given location, then the previousDataFrame will be used by next steps/transformation.

Also there is some issue with SPARK UI. Please see below:

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message