From user-return-914-archive-asf-public=cust-asf.ponee.io@arrow.apache.org Fri Jan 15 08:38:14 2021 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mxout1-he-de.apache.org (mxout1-he-de.apache.org [95.216.194.37]) by mx-eu-01.ponee.io (Postfix) with ESMTPS id B8493180654 for ; Fri, 15 Jan 2021 09:38:14 +0100 (CET) Received: from mail.apache.org (mailroute1-lw-us.apache.org [207.244.88.153]) by mxout1-he-de.apache.org (ASF Mail Server at mxout1-he-de.apache.org) with SMTP id 1C1F465CB3 for ; Fri, 15 Jan 2021 08:37:33 +0000 (UTC) Received: (qmail 79876 invoked by uid 500); 15 Jan 2021 08:37:32 -0000 Mailing-List: contact user-help@arrow.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@arrow.apache.org Delivered-To: mailing list user@arrow.apache.org Received: (qmail 79866 invoked by uid 99); 15 Jan 2021 08:37:32 -0000 Received: from spamproc1-he-de.apache.org (HELO spamproc1-he-de.apache.org) (116.203.196.100) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 15 Jan 2021 08:37:32 +0000 Received: from localhost (localhost [127.0.0.1]) by spamproc1-he-de.apache.org (ASF Mail Server at spamproc1-he-de.apache.org) with ESMTP id 10B0A1FF0EB for ; Fri, 15 Jan 2021 08:37:32 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamproc1-he-de.apache.org X-Spam-Flag: NO X-Spam-Score: 0.999 X-Spam-Level: X-Spam-Status: No, score=0.999 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_REPLY=1, HTML_MESSAGE=0.2, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamproc1-he-de.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-he-de.apache.org ([116.203.227.195]) by localhost (spamproc1-he-de.apache.org [116.203.196.100]) (amavisd-new, port 10024) with ESMTP id HnoA0S47sgNa for ; Fri, 15 Jan 2021 08:37:31 +0000 (UTC) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=2607:f8b0:4864:20::d2e; helo=mail-io1-xd2e.google.com; envelope-from=jorisvandenbossche@gmail.com; receiver= Received: from mail-io1-xd2e.google.com (mail-io1-xd2e.google.com [IPv6:2607:f8b0:4864:20::d2e]) by mx1-he-de.apache.org (ASF Mail Server at mx1-he-de.apache.org) with ESMTPS id 8A6027FA00 for ; Fri, 15 Jan 2021 08:37:31 +0000 (UTC) Received: by mail-io1-xd2e.google.com with SMTP id n4so16609760iow.12 for ; Fri, 15 Jan 2021 00:37:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=+uZhxoHWZEypa5snzFmQeYe/ctc2JBWttIfZ/nx5f3Y=; b=p4/CMnA+PuWXJ1vY3NMGAwKpOIVv4xnf1LlNSV6cmu95ogXpsFS1eEbv2je0cdvzlO YgI0YYPV3vP6QI410llkQb3w5gV8IiEB/9Ue/VJy90pXLX5Unw0JOpOpKRTiZkqBFkxC CwW3gqfrrx1dzkHHxUyUf5WIHS5qbsf7abntwQ6Wy6xL1Pu3HS5ZMbB0OZk2TAlsdWTo YOG8G9GtJyE7vjt/DMJOFxNCcn0JAs84ZE3QaSvZtSkW1Wc7LE+YXrCgiEGWOuUK3hhC l5CvgYf2cENFs/sBAFQbArbntX/QaamWrHxm2rRIJJcRwlZody8monUc7hWr7Jt3igla MpXQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=+uZhxoHWZEypa5snzFmQeYe/ctc2JBWttIfZ/nx5f3Y=; b=DEDIjCKmi5dBuOUjTmS/Q8LCQkYwx3KMz27pjOjhxDEt7kmykKu8F+AkCgMXEWgKCy qDm7MXX2I6vi/7doaxo1oJ0HK5lgZzK3aSEWnsUAWNNBqped6qgP2G/a7j5ihf6G9Wzw qva5k8wxkmynMVfnZw1DX9XJEM8jXcbEDs5ddL3EWb49hi05e5FOd3pTcpmnOGhr0Sgd 9My2sW0gxzhRFImWcaLHUXnD5EnF6XtVEvn07jcoDP+bNzdwBuwPNeSwL6bLDk9EePQr vqpkDtFq8BuYFJIKfhl1WN/True/nESDLHZApe2Orf8WdPpSD+Q5hjJkB92fhWMQqxTG +DGw== X-Gm-Message-State: AOAM5327SaMxoHk9tTIfM09yRM1XS6xiunVbfBJHdZ+APiPm4DZUMU/4 eQpp+/UVxCLeKCBc60WeGPM67WVccV0agcGOzXTpyReRIt417O7w X-Google-Smtp-Source: ABdhPJxAbb4nqcq2Lu8BH8PHMa+QZ7kM+MwmtwQfbam6TOn0qNRxSuWJTLf9X5opUwXr6kRkHTLH1hR3FBpnyfZje1A= X-Received: by 2002:a6b:f202:: with SMTP id q2mr7747947ioh.87.1610699850226; Fri, 15 Jan 2021 00:37:30 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Joris Van den Bossche Date: Fri, 15 Jan 2021 09:37:18 +0100 Message-ID: Subject: Re: [Python] Possible to filter member of struct field? To: user@arrow.apache.org Content-Type: multipart/alternative; boundary="00000000000069a16205b8ec4821" --00000000000069a16205b8ec4821 Content-Type: text/plain; charset="UTF-8" Hi Partha, The functionality to select a nested field exists in the C++ library, but as far as I know, this is not yet exposed in the Python bindings, so the example you are showing is not yet supported in practice. I opened a JIRA to track this feature: https://issues.apache.org/jira/browse/ARROW-11259 Best, Joris On Wed, 13 Jan 2021 at 19:57, PARTHA DUTTA wrote: > I have a Parquet file which has a field defined as a struct: > workEmail: struct > child 0, address: string > -- field metadata -- > PARQUET:field_id: '13' > -- field metadata -- > PARQUET:field_id: '1' > > I am trying to write a filter as a DNF to query a specific value for > workEmail.address but pyarrow does not seem to accept the DNF: > > tbl = pyarrow.parquet.read_table(filename, use_legacy_dataset=False, > columns=["workEmail"], filters=[("workEmail.address", "=", "some@one.com > ")]) > > Is this supported? If not, any other workarounds? > > -- > Partha Dutta > partha.dutta@gmail.com > --00000000000069a16205b8ec4821 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi Partha,

The functionality= to select a nested field exists in the C++ library, but as far as I know, = this is not yet exposed in the Python bindings, so the example you are show= ing is not yet supported in practice.

I opene= d a JIRA to track this feature: https://issues.apache.org/jira/browse/ARROW-11259

Best,
Joris

On Wed, 13 Jan 2021 at 19:5= 7, PARTHA DUTTA <partha.dutta@= gmail.com> wrote:
I have a Parquet file which has a field defined= as a struct:
workEmail: struct<address: string>
=C2=A0 child = 0, address: string
=C2=A0 =C2=A0 -- field metadata --
=C2=A0 =C2=A0 P= ARQUET:field_id: '13'
=C2=A0 -- field metadata --
=C2=A0 PARQ= UET:field_id: '1'

I am trying to write a f= ilter as a DNF to query a specific value for workEmail.address but pyarrow = does not seem to accept the DNF:

tbl =3D pyarrow.p= arquet.read_table(filename, use_legacy_dataset=3DFalse, columns=3D["wo= rkEmail"], filters=3D[("workEmail.address", "=3D",= "some@one.com&q= uot;)])

Is this supported? If not, any= other workarounds?

--
--00000000000069a16205b8ec4821--