spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shyam P <>
Subject [No Subject]
Date Tue, 05 Mar 2019 07:54:56 GMT
Hi All,
  I need to save a huge data frame as parquet file. As it is huge its
taking several hours. To improve performance it is known I have to send it
group wise.

But when I do partition ( columns*) /groupBy(Columns*) , driver is spilling
a lot of data and performance hits a lot again.

So how to handle this situation and save one group after another.

Attaching the sample scenario of the same.

Highly appreciate your help.


View raw message