From user-return-989-archive-asf-public=cust-asf.ponee.io@arrow.apache.org Sat Feb 13 15:17:43 2021 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mxout1-ec2-va.apache.org (mxout1-ec2-va.apache.org [3.227.148.255]) by mx-eu-01.ponee.io (Postfix) with ESMTPS id 98CDC180607 for ; Sat, 13 Feb 2021 16:17:43 +0100 (CET) Received: from mail.apache.org (mailroute1-lw-us.apache.org [207.244.88.153]) by mxout1-ec2-va.apache.org (ASF Mail Server at mxout1-ec2-va.apache.org) with SMTP id D37BE42D68 for ; Sat, 13 Feb 2021 15:17:42 +0000 (UTC) Received: (qmail 43330 invoked by uid 500); 13 Feb 2021 15:17:42 -0000 Mailing-List: contact user-help@arrow.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@arrow.apache.org Delivered-To: mailing list user@arrow.apache.org Received: (qmail 43320 invoked by uid 99); 13 Feb 2021 15:17:42 -0000 Received: from spamproc1-he-fi.apache.org (HELO spamproc1-he-fi.apache.org) (95.217.134.168) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 13 Feb 2021 15:17:42 +0000 Received: from localhost (localhost [127.0.0.1]) by spamproc1-he-fi.apache.org (ASF Mail Server at spamproc1-he-fi.apache.org) with ESMTP id 8248EC0116 for ; Sat, 13 Feb 2021 15:17:41 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamproc1-he-fi.apache.org X-Spam-Flag: NO X-Spam-Score: -0.2 X-Spam-Level: X-Spam-Status: No, score=-0.2 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamproc1-he-fi.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-he-de.apache.org ([116.203.227.195]) by localhost (spamproc1-he-fi.apache.org [95.217.134.168]) (amavisd-new, port 10024) with ESMTP id kVF-_VbXAHo0 for ; Sat, 13 Feb 2021 15:17:41 +0000 (UTC) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=2607:f8b0:4864:20::134; helo=mail-il1-x134.google.com; envelope-from=chairmank@gmail.com; receiver= Received: from mail-il1-x134.google.com (mail-il1-x134.google.com [IPv6:2607:f8b0:4864:20::134]) by mx1-he-de.apache.org (ASF Mail Server at mx1-he-de.apache.org) with ESMTPS id CFD037F7A4 for ; Sat, 13 Feb 2021 15:17:40 +0000 (UTC) Received: by mail-il1-x134.google.com with SMTP id w1so1889694ilm.12 for ; Sat, 13 Feb 2021 07:17:40 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=uiotLM3CRrJiuSXlcmmnbWJ5K97cNj409D+fGpMAspo=; b=EyBM62Zsqk5EgtvFhKyX0qsToW/ZtpZ+M3pSwLCbjQdKWNJyL6j92mIufYw3widHI5 InDk3UbjQS9dkSnhgrMVzTJAD9w7F33NCs2Cv0uDMbQqfFEUnUZQP4XUEvwcqj3z41c/ CcKq06lGx8ByTHTiBty0GCRjKhgGhYUPe8Rz2qdOoDFHEYbnDpdXzh7aMBt0hjcErrXT 42jEaCVFTFObxGULXCJavsFA9gyCEStstzyxxT4yHBqska8ikkVaRr1Gz7KG9ToO5rJI bsw9TyDQ5Yf344+hhB5TCf6KTKTOuUzyfGRw0FRjUVN6Cd5NNZtDD1eB/fcIWUZOobCU LRbQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=uiotLM3CRrJiuSXlcmmnbWJ5K97cNj409D+fGpMAspo=; b=hBZLDL2IkoSPNk85DlL6Vt/d6ExFg2klWHxidfEXqU6cdDHJs0mBsk4A9UA/Zo21+f BxebsH1nHTlFxgDGKOJ4sSDTJR8qoW59fbJX9X9bI63jVvkOIm7CT5KZbd3oETwqO7Pu RTpUWbeek2eVbt2fEunPyB6X5I3eXdtj4Dp+cmfCE3P65hruKPYHE1Uh4tGjOHG4xfjw CsYU/H5dloW68ol3DOLbc+GuD4v+uT7SVmp9+iqyUYqE2++gjANnEz6cyqld6lcWblbI 9jEgvAMYloMNxkqMNPbbZG2BStGwl9JCvTfBHuvDb1C+bcP/ettRLkJ9d4qq63s4sb8W Sldw== X-Gm-Message-State: AOAM532Y9E3o87NiO4/O1nXiVTa5A/AQwaGJPiXbu1AYwdSzWQD+dPc/ pbsDZEms5CxBsVeTnIsWG8FWLRNyhf5/2K9Ux2E6uzSjqfo= X-Google-Smtp-Source: ABdhPJx/mjIfHT6sbz+4sE1Uxdb1Hc+l2SgxJMu0xKdmxQv/a7V3arIZgKEaFjqjLGLbALVCi9XaAV9BHkkPpTAd72Y= X-Received: by 2002:a92:6511:: with SMTP id z17mr6526090ilb.232.1613229453722; Sat, 13 Feb 2021 07:17:33 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Steve Kim Date: Sat, 13 Feb 2021 09:16:57 -0600 Message-ID: Subject: Re: [Rust] [DataFusion] Reading remote parquet files in S3? To: user@arrow.apache.org Content-Type: text/plain; charset="UTF-8" > Currently, parquet.rs only supports local disk files. Potentially, this can be done using the rusoto crate that provides a s3 client. What would be a good way to do this? > 1. create a remote parquet reader (potentially duplicate lots of code) > 2. create an interface to abstract away reading from local/remote files (not sure about performance if the reader blocks on every operation) This is a great question. I think that approach (2) is superior, although it requires more work than approach (1) to design an interface that works well across multiple file stores that have different performance characteristics. To accommodate storage-specific performance optimizations, I expect that the common interface will have to be more elaborate than the current reader API. Is it possible for the Rust reader to use the c++ implementation (https://github.com/apache/arrow/tree/master/cpp/src/arrow/filesystem)? If this reuse of implementation is feasible, then we could focus efforts on improving the c++ implementation and get the benefits in Python, Rust, etc. In the Java ecosystem, the (non-Arrow, row-wise) Parquet reader uses the Hadoop FileSystem abstraction. This abstraction is complex, leaky, and not well specialized for read patterns that are typical for Parquet files. We can learn from these mistakes to create a superior reader interface in the Arrow/Parquet project. Steve