From user-return-1255-archive-asf-public=cust-asf.ponee.io@arrow.apache.org Thu Jun 3 21:59:23 2021 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mxout1-ec2-va.apache.org (mxout1-ec2-va.apache.org [3.227.148.255]) by mx-eu-01.ponee.io (Postfix) with ESMTPS id 78F8C180643 for ; Thu, 3 Jun 2021 23:59:23 +0200 (CEST) Received: from mail.apache.org (mailroute1-lw-us.apache.org [207.244.88.153]) by mxout1-ec2-va.apache.org (ASF Mail Server at mxout1-ec2-va.apache.org) with SMTP id 01D98416B4 for ; Thu, 3 Jun 2021 21:59:17 +0000 (UTC) Received: (qmail 44523 invoked by uid 500); 3 Jun 2021 21:59:16 -0000 Mailing-List: contact user-help@arrow.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@arrow.apache.org Delivered-To: mailing list user@arrow.apache.org Received: (qmail 44513 invoked by uid 99); 3 Jun 2021 21:59:16 -0000 Received: from spamproc1-he-fi.apache.org (HELO spamproc1-he-fi.apache.org) (95.217.134.168) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 03 Jun 2021 21:59:16 +0000 Received: from localhost (localhost [127.0.0.1]) by spamproc1-he-fi.apache.org (ASF Mail Server at spamproc1-he-fi.apache.org) with ESMTP id 05EA1C0452 for ; Thu, 3 Jun 2021 21:59:16 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamproc1-he-fi.apache.org X-Spam-Flag: NO X-Spam-Score: -0.201 X-Spam-Level: X-Spam-Status: No, score=-0.201 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamproc1-he-fi.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-he-de.apache.org ([116.203.227.195]) by localhost (spamproc1-he-fi.apache.org [95.217.134.168]) (amavisd-new, port 10024) with ESMTP id vU5V_W-uQEzX for ; Thu, 3 Jun 2021 21:59:15 +0000 (UTC) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=2a00:1450:4864:20::12b; helo=mail-lf1-x12b.google.com; envelope-from=wesmckinn@gmail.com; receiver= Received: from mail-lf1-x12b.google.com (mail-lf1-x12b.google.com [IPv6:2a00:1450:4864:20::12b]) by mx1-he-de.apache.org (ASF Mail Server at mx1-he-de.apache.org) with ESMTPS id 761D37FF84 for ; Thu, 3 Jun 2021 21:59:15 +0000 (UTC) Received: by mail-lf1-x12b.google.com with SMTP id a2so11040003lfc.9 for ; Thu, 03 Jun 2021 14:59:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :content-transfer-encoding; bh=/fTxXeVlbekvbnluVH48aSr+y8+0UvGVmFwvEIx66ng=; b=f339vbsdcbNat0CkjTarPCX5ipiRdXtOXo5Cb4feR5pOjRuFm+gdSG8WcXZ8st4/oD 9kCLDVYL0W+4UJ5DSc6//PWbeMr0vRgDqhgZt3KYkp+uS5yRIMtkSfo+m/OK8sjcKO6d g2IvYENmGbb06tbL/Wxao/16bPaERhsW+XUq86YxPOtn4h55bV1lhYh3+y0RMrTAETFt DfP62TI+zZLJImYOs2sYdFXfpwofYbG9z703g+JwMiR16D3WQwqhOK34A4aYa9vasHrd x9JM5jK3P/0aRnAkfdEW2U7EFNmPKRPP6M7+lUCKDiOZg3aCiUBa2xRA7f9VKP5rJc6U vB7Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:content-transfer-encoding; bh=/fTxXeVlbekvbnluVH48aSr+y8+0UvGVmFwvEIx66ng=; b=mxoyFnXQFjwPmrEcgfqBIEc9AYMOoqiQjTh1VfY92d6KSoVylYoYhXIYO+p71JKZ9B LLOgajdSsgRhLgxYEgCB/Mz6KbOK9hm0kFMKF39L9J2P7yfLsAOqzteiPjnY8ZP9zyvr bd70fwaA5k12nBKJlVtAKXr8y+khRAof114D78GXHj3tzLLi6PxvHm/AgQ990hKY0bsU ZcriSfZLAG5gz9k8MyJej62842Q97v08O+VDSSqsSYqoAnkFoIExAWql1vMWQImUsApZ F2X3udIc1GRzqHlgMC1kTxYplnz5hPG+lJBBip9x+VFMOE5KpvUKnj60e5+Tz21BJ2hV OE4g== X-Gm-Message-State: AOAM532w2AP5Zp9+7JwjlzWJZwVJVhkZEvpTSAcH65f/zsZrJ9duN9CQ LxfPmcF6mVBHs4hLVCHWIsoV30bUAgGWVXboc0vOi0ZxTzA= X-Google-Smtp-Source: ABdhPJwkA+I5MEmq7EnSVyemyhVE+9dkLmaVHV35oRFCHVUj2pn2BS34DPZniov1+lYXhJlK69/OwwXSaWc65TWPqyU= X-Received: by 2002:a05:6512:794:: with SMTP id x20mr692167lfr.78.1622757554266; Thu, 03 Jun 2021 14:59:14 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Wes McKinney Date: Thu, 3 Jun 2021 16:58:38 -0500 Message-ID: Subject: Re: [Python] Read dictionary values of DictionaryArray without reading the whole file To: user@arrow.apache.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable It isn't possible with the current API, but all of the library machinery exists for you to be able to obtain this without extraordinary pain (speaking as one of the people who participated in the direct-read/write of arrow::DictionaryArray implementation). You would need to do some work on the C++ library to externalize just the dictionary data page. On Thu, Jun 3, 2021 at 2:55 PM Juan Galvez wrote: > > Hello, > > I have a large parquet file written by pandas with categorical columns (w= hich are read into Arrow as DictionaryArray). I want to get the value of th= e categories in Python (called "dictionary" values in Arrow) without having= to read any other data from the file into memory other than metadata. Is t= his possible? > > Thank you, > -Juan >