arrow-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe L. Korn" <uw...@xhochy.com>
Subject Re: ParquetDataset Filters Question
Date Thu, 23 May 2019 11:05:41 GMT
Hello Abe,

I think the problems lies in the case that you mix two syntaxes. We either support a "list
of tuples" or "list of lists of tuples". Furthermore the correct DNF for your filter would
be (A ⋀ B ⋀ C)  ⋁  (A ⋀ B ⋀ D), thus you should use

filters = [[("col", ">=", "<A>"),  ("col", "<=", "<B>"), ("col", "=", "<C>")],
 [("col", ">=", "<A>"),  ("col", "<=", "<B>"), ("col", "=", "<D>")]]

Uwe
 
[[("col", ">=", "<>"),
> >>     ("col", "<=", "<>"),
> >>     [[("col", "=", "<>")], [("col", "=", "<>")]]
> >> 

On Wed, May 22, 2019, at 9:12 PM, Wes McKinney wrote:
> hi Abe -- you may have to open a JIRA about documentation improvement
> and/or bug fix for this. I don't know off-hand. Copying the dev@ list
> 
> - Wes
> 
> On Tue, May 21, 2019 at 12:05 PM Abraham Elmahrek <abe@apache.org> wrote:
> >
> > Folks
> >
> > Does any one know how to do the following with filters for ParquetDataset (DNF):
A ⋀ B ⋀ (C ⋁ D)?
> >
> > I've tried the following without luck:
> >
> >> dataset = pq.ParquetDataset("<>", filesystem=s3fs.S3FileSystem(), filters=[
> >>     ("col", ">=", "<>"),
> >>     ("col", "<=", "<>"),
> >>     [[("col", "=", "<>")], [("col", "=", "<>")]]
> >> ])
> >
> >
> > Where A = ("col", ">=", "<>"), B = ("col", "<=", "<>"), C = ("col",
"=", "<>"), and D = ("col", "=", "<>").
> >
> > In the above example, I get the following error:
> >>
> >>   File "/opt/miniconda/envs/flatiron-cron/lib/python3.6/site-packages/pyarrow-0.13.0-py3.6-linux-x86_64.egg/pyarrow/parquet.py",
line 961, in __init__
> >>     filters = _check_filters(filters)
> >>   File "/opt/miniconda/envs/flatiron-cron/lib/python3.6/site-packages/pyarrow-0.13.0-py3.6-linux-x86_64.egg/pyarrow/parquet.py",
line 93, in _check_filters
> >>     for col, op, val in conjunction:
> >> ValueError: not enough values to unpack (expected 3, got 2)
> >
> >
> > Abe
>

Mime
View raw message