From user-return-137-archive-asf-public=cust-asf.ponee.io@arrow.apache.org Wed May 22 19:12:55 2019 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id 46FA5180651 for ; Wed, 22 May 2019 21:12:55 +0200 (CEST) Received: (qmail 20940 invoked by uid 500); 22 May 2019 19:12:54 -0000 Mailing-List: contact user-help@arrow.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@arrow.apache.org Delivered-To: mailing list user@arrow.apache.org Received: (qmail 20921 invoked by uid 99); 22 May 2019 19:12:54 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 22 May 2019 19:12:54 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 94A13C2DAB; Wed, 22 May 2019 19:12:53 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -0.635 X-Spam-Level: X-Spam-Status: No, score=-0.635 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.436, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd4-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id giotbNzOyH2w; Wed, 22 May 2019 19:12:52 +0000 (UTC) Received: from mail-it1-f180.google.com (mail-it1-f180.google.com [209.85.166.180]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 5C30F612C8; Wed, 22 May 2019 19:12:52 +0000 (UTC) Received: by mail-it1-f180.google.com with SMTP id m141so5468479ita.3; Wed, 22 May 2019 12:12:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :content-transfer-encoding; bh=Hs/rDrurZYUdUEhzfVP1UXfpIcq3KWZT77+0DiPM2zM=; b=R6iNPszflZIEWScaAbsd3W+mUdpOFMetE3Utwfi8eeJNcitmdn0GmOC7zf68pxx7yZ +5tPoXluxbDWNAh/zxU4Hl6cxY9xjbvvNgtBZPuQe3I6jUQnZI7HsG8YFsVGWqnT3GAz GKtr4hm1r1AFdlXF4e+ttnj66r+A+9WxwNR2Nc/Gz5QaUfNvp/wzkQdFAlI/M88HJi+X ykUhoDsGXOI2VtEuHq886rWb4Xk4l8ZYEZ9YZjQBJymSayvlKCqrjdp0xXaot4NAcmwE 4CET2r9qeHuRY/Hle/m/tCNlUqb2oxrhc9yE4mT0ZT/rmq1pAGXdDCnFbonxLffafndi Ig4w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:content-transfer-encoding; bh=Hs/rDrurZYUdUEhzfVP1UXfpIcq3KWZT77+0DiPM2zM=; b=LHonAnUCG4VcN7Z0mkBn7UhuMWkbouKil5jqxAbdCr/P+fAdhFu3q+REtfjuqKyUyp WBBwGQYdFe4/5KsLPjMF9l1oCN6GGrN8etGbHigVp1TWoQNC1SMbvOG5la+3XecGJBvx eI8uuk8As6PU8Teaf7/jBKLhKnJZLd5J+OwTdehIKKTfrmQfaoYu80iA14LHf06Yp9TC kmUtAkvx5Rb8Q0QinGiyi33ANTCk3HJqLEwlvQCgXXetdtATDqR01+wNxi5DbV0sRWV8 xdO0uJAshicPqT0OYX20hNicySPJxY9E692x+OWmhcQ3PplrPeQ+LmVR3H6KYHJvrEMJ qRAQ== X-Gm-Message-State: APjAAAUfbNO55eYBuXZ36IRCdOd64l3GRzyJ7L4W4vWsEo/VB4x/FrpU lEMiEAFzqR6VBkx5ufE4Nt+JXltPFVUsFV9FipzyiarI X-Google-Smtp-Source: APXvYqzjPoIohMV88eEwYUGEyYeH0kOh6D26A6SNn1Rim/v/+Pig1qeCs3dVwUaOyNGdlWVwLeknkWcNkzzr9tqNsAs= X-Received: by 2002:a24:194c:: with SMTP id b73mr9745467itb.157.1558552366222; Wed, 22 May 2019 12:12:46 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Wes McKinney Date: Wed, 22 May 2019 14:12:09 -0500 Message-ID: Subject: Re: ParquetDataset Filters Question To: user@arrow.apache.org, dev@arrow.apache.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable hi Abe -- you may have to open a JIRA about documentation improvement and/or bug fix for this. I don't know off-hand. Copying the dev@ list - Wes On Tue, May 21, 2019 at 12:05 PM Abraham Elmahrek wrote: > > Folks > > Does any one know how to do the following with filters for ParquetDataset= (DNF): A =E2=8B=80 B =E2=8B=80 (C =E2=8B=81 D)? > > I've tried the following without luck: > >> dataset =3D pq.ParquetDataset("<>", filesystem=3Ds3fs.S3FileSystem(), fi= lters=3D[ >> ("col", ">=3D", "<>"), >> ("col", "<=3D", "<>"), >> [[("col", "=3D", "<>")], [("col", "=3D", "<>")]] >> ]) > > > Where A =3D ("col", ">=3D", "<>"), B =3D ("col", "<=3D", "<>"), C =3D ("c= ol", "=3D", "<>"), and D =3D ("col", "=3D", "<>"). > > In the above example, I get the following error: >> >> File "/opt/miniconda/envs/flatiron-cron/lib/python3.6/site-packages/py= arrow-0.13.0-py3.6-linux-x86_64.egg/pyarrow/parquet.py", line 961, in __ini= t__ >> filters =3D _check_filters(filters) >> File "/opt/miniconda/envs/flatiron-cron/lib/python3.6/site-packages/py= arrow-0.13.0-py3.6-linux-x86_64.egg/pyarrow/parquet.py", line 93, in _check= _filters >> for col, op, val in conjunction: >> ValueError: not enough values to unpack (expected 3, got 2) > > > Abe