arrow-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Abraham Elmahrek <...@apache.org>
Subject ParquetDataset Filters Question
Date Tue, 21 May 2019 17:04:59 GMT
Folks

Does any one know how to do the following with filters for ParquetDataset
(DNF): A ⋀ B ⋀ (C ⋁ D)?

I've tried the following without luck:

dataset = pq.ParquetDataset("<>", filesystem=s3fs.S3FileSystem(), filters=[
>     ("col", ">=", "<>"),
>     ("col", "<=", "<>"),
>     [[("col", "=", "<>")], [("col", "=", "<>")]]
> ])


Where A = ("col", ">=", "<>"), B = ("col", "<=", "<>"), C = ("col", "=",
"<>"), and D = ("col", "=", "<>").

In the above example, I get the following error:

>   File
> "/opt/miniconda/envs/flatiron-cron/lib/python3.6/site-packages/pyarrow-0.13.0-py3.6-linux-x86_64.egg/pyarrow/parquet.py",
> line 961, in __init__
>     filters = _check_filters(filters)
>   File
> "/opt/miniconda/envs/flatiron-cron/lib/python3.6/site-packages/pyarrow-0.13.0-py3.6-linux-x86_64.egg/pyarrow/parquet.py",
> line 93, in _check_filters
>     for col, op, val in conjunction:
> ValueError: not enough values to unpack (expected 3, got 2)


Abe

Mime
View raw message