spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shyam P <shyamabigd...@gmail.com>
Subject [No Subject]
Date Tue, 05 Mar 2019 07:54:56 GMT
Hi All,
  I need to save a huge data frame as parquet file. As it is huge its
taking several hours. To improve performance it is known I have to send it
group wise.

But when I do partition ( columns*) /groupBy(Columns*) , driver is spilling
a lot of data and performance hits a lot again.

So how to handle this situation and save one group after another.

Attaching the sample scenario of the same.

https://stackoverflow.com/questions/54416623/how-to-group-dataframe-year-wise-and-iterate-through-groups-and-send-each-year-d

Highly appreciate your help.

Thanks,
Shyam

Mime
View raw message