arrow-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Abraham Elmahrek <abra...@elmahrek.com>
Subject Re: ParquetDataset Filters Question
Date Thu, 23 May 2019 18:20:59 GMT
Thanks guys. That makes sense.

On Thu, May 23, 2019 at 4:06 AM Uwe L. Korn <uwelk@xhochy.com> wrote:

> Hello Abe,
>
> I think the problems lies in the case that you mix two syntaxes. We either
> support a "list of tuples" or "list of lists of tuples". Furthermore the
> correct DNF for your filter would be (A ⋀ B ⋀ C)  ⋁  (A ⋀ B ⋀ D), thus you
> should use
>
> filters = [[("col", ">=", "<A>"),  ("col", "<=", "<B>"), ("col", "=",
> "<C>")],  [("col", ">=", "<A>"),  ("col", "<=", "<B>"), ("col",
"=",
> "<D>")]]
>
> Uwe
>
> [[("col", ">=", "<>"),
> > >>     ("col", "<=", "<>"),
> > >>     [[("col", "=", "<>")], [("col", "=", "<>")]]
> > >>
>
> On Wed, May 22, 2019, at 9:12 PM, Wes McKinney wrote:
> > hi Abe -- you may have to open a JIRA about documentation improvement
> > and/or bug fix for this. I don't know off-hand. Copying the dev@ list
> >
> > - Wes
> >
> > On Tue, May 21, 2019 at 12:05 PM Abraham Elmahrek <abe@apache.org>
> wrote:
> > >
> > > Folks
> > >
> > > Does any one know how to do the following with filters for
> ParquetDataset (DNF): A ⋀ B ⋀ (C ⋁ D)?
> > >
> > > I've tried the following without luck:
> > >
> > >> dataset = pq.ParquetDataset("<>", filesystem=s3fs.S3FileSystem(),
> filters=[
> > >>     ("col", ">=", "<>"),
> > >>     ("col", "<=", "<>"),
> > >>     [[("col", "=", "<>")], [("col", "=", "<>")]]
> > >> ])
> > >
> > >
> > > Where A = ("col", ">=", "<>"), B = ("col", "<=", "<>"), C
= ("col",
> "=", "<>"), and D = ("col", "=", "<>").
> > >
> > > In the above example, I get the following error:
> > >>
> > >>   File
> "/opt/miniconda/envs/flatiron-cron/lib/python3.6/site-packages/pyarrow-0.13.0-py3.6-linux-x86_64.egg/pyarrow/parquet.py",
> line 961, in __init__
> > >>     filters = _check_filters(filters)
> > >>   File
> "/opt/miniconda/envs/flatiron-cron/lib/python3.6/site-packages/pyarrow-0.13.0-py3.6-linux-x86_64.egg/pyarrow/parquet.py",
> line 93, in _check_filters
> > >>     for col, op, val in conjunction:
> > >> ValueError: not enough values to unpack (expected 3, got 2)
> > >
> > >
> > > Abe
> >
>

Mime
View raw message