arrow-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adam Shirey <adam.shi...@gmail.com>
Subject [Rust] Provide guidance on number of file descriptors needed to read Parquet file
Date Tue, 28 Jul 2020 16:53:35 GMT
I have a series of Parquet files that are 181 columns wide, and I'm
processing them in parallel (using rayon
<https://github.com/rayon-rs/rayon/>). I ran into the OS limit (default
1024 according to ulimit -n) of open file descriptors when doing this. My
assumption is that there's one file descriptor per column per file, so
opening 5 files @ 181 per *should* open about 905, plus maybe a few more
for metadata, etc. However, each file I read was consuming 208 descriptors.

Is there a deterministic calculation for how many file descriptors will be
used to process files so that one can determine appropriate multithreading
in a situation like this?

Thanks,
-Adam

Mime
View raw message