arrow-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wes McKinney <wesmck...@gmail.com>
Subject Re: ParquetDataset Filters Question
Date Wed, 22 May 2019 19:12:09 GMT
hi Abe -- you may have to open a JIRA about documentation improvement
and/or bug fix for this. I don't know off-hand. Copying the dev@ list

- Wes

On Tue, May 21, 2019 at 12:05 PM Abraham Elmahrek <abe@apache.org> wrote:
>
> Folks
>
> Does any one know how to do the following with filters for ParquetDataset (DNF): A ⋀
B ⋀ (C ⋁ D)?
>
> I've tried the following without luck:
>
>> dataset = pq.ParquetDataset("<>", filesystem=s3fs.S3FileSystem(), filters=[
>>     ("col", ">=", "<>"),
>>     ("col", "<=", "<>"),
>>     [[("col", "=", "<>")], [("col", "=", "<>")]]
>> ])
>
>
> Where A = ("col", ">=", "<>"), B = ("col", "<=", "<>"), C = ("col",
"=", "<>"), and D = ("col", "=", "<>").
>
> In the above example, I get the following error:
>>
>>   File "/opt/miniconda/envs/flatiron-cron/lib/python3.6/site-packages/pyarrow-0.13.0-py3.6-linux-x86_64.egg/pyarrow/parquet.py",
line 961, in __init__
>>     filters = _check_filters(filters)
>>   File "/opt/miniconda/envs/flatiron-cron/lib/python3.6/site-packages/pyarrow-0.13.0-py3.6-linux-x86_64.egg/pyarrow/parquet.py",
line 93, in _check_filters
>>     for col, op, val in conjunction:
>> ValueError: not enough values to unpack (expected 3, got 2)
>
>
> Abe

Mime
View raw message