spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Apache Spark (JIRA)" <j...@apache.org>
Subject [jira] [Assigned] (SPARK-21971) Too many open files in Spark due to concurrent files being opened
Date Mon, 11 Sep 2017 01:50:02 GMT

     [ https://issues.apache.org/jira/browse/SPARK-21971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Apache Spark reassigned SPARK-21971:
------------------------------------

    Assignee:     (was: Apache Spark)

> Too many open files in Spark due to concurrent files being opened
> -----------------------------------------------------------------
>
>                 Key: SPARK-21971
>                 URL: https://issues.apache.org/jira/browse/SPARK-21971
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core, SQL
>    Affects Versions: 2.1.0
>            Reporter: Rajesh Balamohan
>            Priority: Minor
>
> When running Q67 of TPC-DS at 1 TB dataset on multi node cluster, it consistently fails
with "too many open files" exception.
> {noformat}
> O scheduler.TaskSetManager: Finished task 25.0 in stage 844.0 (TID 243786) in 394 ms
on machine111.xyz (executor 2) (189/200)
> 17/08/20 10:33:45 INFO scheduler.TaskSetManager: Finished task 172.0 in stage 844.0 (TID
243932) in 11996 ms on cn116-10.l42scl.hortonworks.com (executor 6) (190/200)
> 17/08/20 10:37:40 WARN scheduler.TaskSetManager: Lost task 144.0 in stage 844.0 (TID
243904, machine1.xyz, executor 1): java.nio.file.FileSystemException: /grid/3/hadoop/yarn/local/usercache/rbalamohan/appcache/application_1490656001509_7207/blockmgr-5180e3f0-f7ed-44bb-affc-8f99f09ba7bc/28/temp_local_690afbf7-172d-4fdb-8492-3e2ebd8d5183:
Too many open files
>         at sun.nio.fs.UnixException.translateToIOException(UnixException.java:91)
>         at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
>         at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
>         at sun.nio.fs.UnixFileSystemProvider.newFileChannel(UnixFileSystemProvider.java:177)
>         at java.nio.channels.FileChannel.open(FileChannel.java:287)
>         at java.nio.channels.FileChannel.open(FileChannel.java:335)
>         at org.apache.spark.io.NioBufferedFileInputStream.<init>(NioBufferedFileInputStream.java:43)
>         at org.apache.spark.util.collection.unsafe.sort.UnsafeSorterSpillReader.<init>(UnsafeSorterSpillReader.java:75)
>         at org.apache.spark.util.collection.unsafe.sort.UnsafeSorterSpillWriter.getReader(UnsafeSorterSpillWriter.java:150)
>         at org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.getIterator(UnsafeExternalSorter.java:607)
>         at org.apache.spark.sql.execution.ExternalAppendOnlyUnsafeRowArray.generateIterator(ExternalAppendOnlyUnsafeRowArray.scala:169)
>         at org.apache.spark.sql.execution.ExternalAppendOnlyUnsafeRowArray.generateIterator(ExternalAppendOnlyUnsafeRowArray.scala:173)
> {noformat}
> Cluster was configured with multiple cores per executor. 
> Window function uses "spark.sql.windowExec.buffer.spill.threshold=4096" which causes
large number of spills in larger dataset. With multiple cores per executor, this reproduces
easily. 
> {{UnsafeExternalSorter::getIterator()}} invokes {{spillWriter.getReader}} for all the
available spillWriters. {{UnsafeSorterSpillReader}} opens up the file in its constructor and
closes the file later as a part of its close() call. This causes too many open files issue.
> Note that this is not a file leak, but more of concurrent files being open at any given
time depending on the dataset being processed.
> One option could be to increase "spark.sql.windowExec.buffer.spill.threshold" so that
fewer spill files are generated, but it is hard to determine the sweetspot for all workload.
Another option is to set ulimit to "unlimited" for files, but that would not be a good production
setting. It would be good to consider reducing the number of concurrent "UnsafeExternalSorter::getIterator".



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message