I have a series of Parquet files that are 181 columns wide, and I'm processing them in parallel (using rayon). I ran into the OS limit (default 1024 according to ulimit -n) of open file descriptors when doing this. My assumption is that there's one file descriptor per column per file, so opening 5 files @ 181 per should open about 905, plus maybe a few more for metadata, etc. However, each file I read was consuming 208 descriptors.
Is there a deterministic calculation for how many file descriptors will be used to process files so that one can determine appropriate multithreading in a situation like this?