spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gerard Maas <gerard.m...@gmail.com>
Subject [Structured Streaming] OOM on ConsoleSink with large inputs
Date Fri, 11 Aug 2017 22:00:35 GMT
Devs,

While investigating another issue, I came across this OOM error when using
the Console Sink with any source that can be larger than the available
driver memory. In my case, I was using the File source and I had a 14G file
in the monitored dir.

I traced back the issue to a `df.collect` in the Console Sink code.
I created a JIRA for it: https://issues.apache.org/jira/browse/SPARK-21710
and a PR is available: https://github.com/apache/spark/pull/18923

I hope a committer can check it out.

-kr, Gerard.

Mime
View raw message