arrow-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joris Van den Bossche <jorisvandenboss...@gmail.com>
Subject Re: PyArrow Compute is_in Usage
Date Thu, 19 Nov 2020 16:18:45 GMT
Hi,

The "is_in" docstring is not directly clear about it, but you need to pass
the second argument as a keyword argument using "value_set" keyword name.
Small example:

In [19]: pc.is_in(pa.array(["a", "b", "c", "d"]), value_set=pa.array(["a",
"c"]))
Out[19]:
<pyarrow.lib.BooleanArray object at 0x7f508af95ac8>
[
  true,
  false,
  true,
  false
]

You can find this keyword in the keywords of pc.SetLookupOptions.

Best,
Joris

On Wed, 18 Nov 2020 at 16:43, Vibhatha Abeykoon <vibhatha@gmail.com> wrote:

> Hello,
>
> I am working on a dataset API on top of Arrow kernels. I am looking into
> the usage of
> *is_in* function in the compute API.
>
> I couldn't figure out how arguments are passed for a is_in check. A simple
> scenario would be;
>
>
> *cylon_tb.from_list([[2,1], [1,0]]*
> *cylon_tb.isin([2])*
>
> Is this very similar to Pandas isin:
> https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.isin.html
> ? If not how could we use *is_in* op?
>
> With Regards,
> Vibhatha Abeykoon
>

Mime
View raw message