From user-return-23-archive-asf-public=cust-asf.ponee.io@arrow.apache.org Thu Oct 11 20:01:23 2018 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id C513518064A for ; Thu, 11 Oct 2018 20:01:22 +0200 (CEST) Received: (qmail 1029 invoked by uid 500); 11 Oct 2018 18:01:21 -0000 Mailing-List: contact user-help@arrow.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@arrow.apache.org Delivered-To: mailing list user@arrow.apache.org Received: (qmail 1019 invoked by uid 99); 11 Oct 2018 18:01:21 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 11 Oct 2018 18:01:21 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 58BAB1A2609 for ; Thu, 11 Oct 2018 18:01:21 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.869 X-Spam-Level: * X-Spam-Status: No, score=1.869 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001, T_DKIMWL_WL_MED=-0.01] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id a2KsF_ec75FT for ; Thu, 11 Oct 2018 18:01:20 +0000 (UTC) Received: from mail-ot1-f46.google.com (mail-ot1-f46.google.com [209.85.210.46]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 995BC5F387 for ; Thu, 11 Oct 2018 18:01:20 +0000 (UTC) Received: by mail-ot1-f46.google.com with SMTP id p23so9794024otf.11 for ; Thu, 11 Oct 2018 11:01:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:from:date:message-id:subject:to; bh=QMCCA/OTFmbKDoyWY02Vn1DOfnzcM+gMtQuxGlFeAA8=; b=jaEVHgiJy1+VA8y+FoDgi0kDdyD+OyGgUXUWoW4JS2sQvvczoFTXK3Ikf8v3gNm/c0 gYYB3Wf8cvHxWWpYnzNbVxqne6jbQ2g7dyI+Wa/Ru/u5wMB190JNgm+sFBQ22iX/4EUg Vu+P5BDie3AYH8ltv6tV2jJTxv292HULN24ABy64GNW8Ilr5l8UGNE+5PQzPowp59RDO l5iWhjVHwQKdlZz38V4eJhhP9IlBQFEpM6rbq97VhiqfcmB5rIUEBpAJPcu7rAm01BJx 3W0F1gxT7dXycsJDDA6MJM14mSvyEUqoeBfH9CXqZqvmWrSUgg+1K5Q4YhRc/4Nwu1De 1BtQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=QMCCA/OTFmbKDoyWY02Vn1DOfnzcM+gMtQuxGlFeAA8=; b=ESbqLCkenGydmSt3vsNkPMRXqDUaXJ0EHv/Td8BrLfgKsgDV9qTZI2PoD+Kp5VuSgu HdYwl4gWTnuOrTBdq6iZLFAX9XkTL/eTrVQSx2nfnAJNW12zwKU2JBvCj0wSPNksd0XL Z+z6C7W9t5pDXskTbO2XgDFZUJOfeY14rZLutIB3uTRSToyMAkgY6VrHYNs9+QDMsvoV glpF5GYcm9/KW1wD8649S0MogAtcMF8vrXEFAgy/ELLM9uQBqQJ1jfnrGUXzaOcbWXVO D9Ln8N318viH3smufCpm4pSgbqUu3jg3JrytTkpUj+mq1uaSfIplOsUO/PeQZsRn8IE0 zT/A== X-Gm-Message-State: ABuFfojXtI3wKSAlIQn9e8SKr063LcQVVqd8mYurRyVVRE8AJ+QBsGQ8 cBwzyT7vXK9N4ry4MDvSk2dhFM+IKnqPtDpfl0aIog== X-Google-Smtp-Source: ACcGV62TCCSOo77fkwtjjCjGCkeJ5LO1pSsRkEmHmzegfq4cgyf6iT08GGpXVykWph5eteEJ8Z48Nzn0voitI0ieto4= X-Received: by 2002:a9d:33a0:: with SMTP id u32mr1584281otc.116.1539280874063; Thu, 11 Oct 2018 11:01:14 -0700 (PDT) MIME-Version: 1.0 From: Luke Date: Thu, 11 Oct 2018 14:01:03 -0400 Message-ID: Subject: parquet file in S3, is there a way to read a subset of all the columns in python To: user@arrow.apache.org Content-Type: multipart/alternative; boundary="000000000000b54d820577f7c1c9" --000000000000b54d820577f7c1c9 Content-Type: text/plain; charset="UTF-8" I have parquet files (each self contained) in S3 and I want to read certain columns into a pandas dataframe without reading the entire object out of S3. Is this implemented? boto3 in python supports reading from offsets in an S3 object but I wasn't sure anyone has made that work with a parquet file corresponding to certain columns? thanks, Luke --000000000000b54d820577f7c1c9 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
I have parquet files (= each self contained) in S3 and I want to read certain columns into a pandas= dataframe without reading the entire object out of S3.=C2=A0=C2=A0<= div>
Is this implemented?=C2=A0 boto3 in python supports = reading from offsets in an S3 object but I wasn't sure anyone has made = that work with a parquet file corresponding to certain columns?
=
thanks,
Luke
--000000000000b54d820577f7c1c9--