From user-return-579-archive-asf-public=cust-asf.ponee.io@arrow.apache.org Tue Jul 28 17:01:09 2020 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mailroute1-lw-us.apache.org (mailroute1-lw-us.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with ESMTPS id 0453218062C for ; Tue, 28 Jul 2020 19:01:09 +0200 (CEST) Received: from mail.apache.org (localhost [127.0.0.1]) by mailroute1-lw-us.apache.org (ASF Mail Server at mailroute1-lw-us.apache.org) with SMTP id 1CDFE124D6C for ; Tue, 28 Jul 2020 17:01:07 +0000 (UTC) Received: (qmail 570 invoked by uid 500); 28 Jul 2020 17:01:06 -0000 Mailing-List: contact user-help@arrow.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@arrow.apache.org Delivered-To: mailing list user@arrow.apache.org Received: (qmail 560 invoked by uid 99); 28 Jul 2020 17:01:06 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 28 Jul 2020 17:01:06 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id A2476C0F6F for ; Tue, 28 Jul 2020 17:01:05 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -0.001 X-Spam-Level: X-Spam-Status: No, score=-0.001 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=0.2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-ec2-va.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id HK0CDgTndFpy for ; Tue, 28 Jul 2020 17:01:03 +0000 (UTC) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=209.85.219.174; helo=mail-yb1-f174.google.com; envelope-from=adam.shirey@gmail.com; receiver= Received: from mail-yb1-f174.google.com (mail-yb1-f174.google.com [209.85.219.174]) by mx1-ec2-va.apache.org (ASF Mail Server at mx1-ec2-va.apache.org) with ESMTPS id 63DD0BB934 for ; Tue, 28 Jul 2020 17:01:03 +0000 (UTC) Received: by mail-yb1-f174.google.com with SMTP id 133so10971910ybu.7 for ; Tue, 28 Jul 2020 10:01:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:from:date:message-id:subject:to; bh=gnXLxXsOHJ74MtLa1RyB6flRSIBR35Vd3wjrYEBEupE=; b=LLnGocu5q2plEXL7UFvx9ans8jxTmkXrM5KzdC95pDGnZXeGwn/Is9lTgrwjpUS49f sZsnr1/jRY8/jwlhVax5fW5dsaP0O+3wtqqbpUX8Cjg3GehUaejXn/xXvw6IKSeTuxdY e2I/tpiAVIvOFEPzAJ8w6QsQEH9xHm0lUf9328OsqD68VjDaq3s2HN94wHG/3chZ6DPe oooKu3SfXLEHmP76pQgtV14j7Hem3OyYesJ1EoqClbMUx4Q/IaBkh8zL4DrhW7IMZXqH DE9xUM3pCLtG+5Hc2mQ4V+A3rLlqsyfQkX0Z7BfILjZFgWxEqZ8KoAZS9LByZKpRDxMH P8sg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=gnXLxXsOHJ74MtLa1RyB6flRSIBR35Vd3wjrYEBEupE=; b=hKB5ZoX5fs7KIaMhZ6c8qoWnIEFDBJ3YKHv7ptKnKR0KV58hN1hezvm4I74MKc1RoG ScDN6CnSAOHwsyMnDE/V7MNynAgUXbCKy+/1D+qEiLOK29uvXE2OVbRFr81DnbYQ+6aR UMV0aQdxmpe9rNcD9xvdgWlj9KjrGQ/h+wJMzAAl679k8jpRQU23Ws8hOdzbtdNxgxRy 1HAGU7DK4N6ORzP3GTR844PRu+Oe1/BLygHHtBxjVBFEvMH2RAghskSVd5Alg96bqzmx 9uqlM5HLtd7Qt4vJge8bUdOt/Wq6hGxxdWbE5bwVnmF8v+Ly/JE8ocOfGIADQ7G1QtGD 20wg== X-Gm-Message-State: AOAM531t7PTxhhNDd9JvjH8351NL7F7lBiiViJcM2bp4KlJi1hxiU9VJ LR5SY3RMhuAIw3AdFKAqVvmAlvAxAKzozT+YpdP4QLa7fd4= X-Google-Smtp-Source: ABdhPJxk7IJkPCZg1XNbu+QkcIjY62EXfvuBBOfTUU2TF3bLxowNYP90vzzGUa3Od7J5J6+5J6TbBXcyvTO/PIHu+HI= X-Received: by 2002:a4a:924b:: with SMTP id g11mr3757546ooh.9.1595955225880; Tue, 28 Jul 2020 09:53:45 -0700 (PDT) MIME-Version: 1.0 From: Adam Shirey Date: Tue, 28 Jul 2020 09:53:35 -0700 Message-ID: Subject: [Rust] Provide guidance on number of file descriptors needed to read Parquet file To: user@arrow.apache.org Content-Type: multipart/alternative; boundary="00000000000050ebde05ab8348ee" --00000000000050ebde05ab8348ee Content-Type: text/plain; charset="UTF-8" I have a series of Parquet files that are 181 columns wide, and I'm processing them in parallel (using rayon ). I ran into the OS limit (default 1024 according to ulimit -n) of open file descriptors when doing this. My assumption is that there's one file descriptor per column per file, so opening 5 files @ 181 per *should* open about 905, plus maybe a few more for metadata, etc. However, each file I read was consuming 208 descriptors. Is there a deterministic calculation for how many file descriptors will be used to process files so that one can determine appropriate multithreading in a situation like this? Thanks, -Adam --00000000000050ebde05ab8348ee Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable

I have a series of Parquet files that are 181 column= s wide, and I'm processing them in parallel (using rayon). I ran into the OS limit (default 1024 accordin= g to ulimit -n) of open file descriptors when doing this.=20 My assumption is that there's one file descriptor per column per file, = so opening 5 files @ 181 per should open about 905, plus maybe a few= more for metadata, etc. However, each file I read was consuming 208 descriptors.

Is there a deterministic calculation for how many file descriptors=20 will be used to process files so that one can determine appropriate=20 multithreading in a situation like this?


Thanks,
-Adam
--00000000000050ebde05ab8348ee--