arrow-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vibhatha Abeykoon <vibha...@gmail.com>
Subject Re: PyArrow Compute is_in Usage
Date Thu, 19 Nov 2020 17:05:43 GMT
Thank You, Joris.

I will look into the pc.SetLookupOptions.

With Regards,
Vibhatha Abeykoon


On Thu, Nov 19, 2020 at 12:02 PM Joris Van den Bossche <
jorisvandenbossche@gmail.com> wrote:

> Hi,
>
> The "is_in" docstring is not directly clear about it, but you need to
> pass the second argument as a keyword argument using "value_set" keyword
> name. Small example:
>
> In [19]: pc.is_in(pa.array(["a", "b", "c", "d"]),
> value_set=pa.array(["a", "c"]))
> Out[19]:
> <pyarrow.lib.BooleanArray object at 0x7f508af95ac8>
> [
>   true,
>   false,
>   true,
>   false
> ]
>
> You can find this keyword in the keywords of pc.SetLookupOptions.
>
> Best,
> Joris
>
> On Wed, 18 Nov 2020 at 16:43, Vibhatha Abeykoon <vibhatha@gmail.com>
> wrote:
>
>> Hello,
>>
>> I am working on a dataset API on top of Arrow kernels. I am looking into
>> the usage of
>> *is_in* function in the compute API.
>>
>> I couldn't figure out how arguments are passed for a is_in check. A
>> simple scenario would be;
>>
>>
>> *cylon_tb.from_list([[2,1], [1,0]]*
>> *cylon_tb.isin([2])*
>>
>> Is this very similar to Pandas isin:
>> https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.isin.html
>> ? If not how could we use *is_in* op?
>>
>> With Regards,
>> Vibhatha Abeykoon
>>
>

Mime
View raw message