arrow-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Neal Richardson <neal.p.richard...@gmail.com>
Subject Re: Cast string array to number/boolean with invalid values
Date Sat, 30 May 2020 14:48:40 GMT
Sounds reasonable, could you please open a JIRA issue?

Neal

On Sat, May 30, 2020 at 1:01 AM Yue Ni <niyue.com@gmail.com> wrote:

> Hi there,
>
> I find arrow compute provides Cast API allowing users to cast from string
> to number/boolean values, but sometimes the string values contain some
> invalid values that cannot be casted to a number/boolean (sorry, data is
> really messy), for example, in a string array like ["1", "2", "3", "None",
> ""]. I wonder if there is any way to handle those invalid values during
> casting.
>
> Currently from the code I read (cast.h/cast.cc), it seems the cast will
> fail and return when dealing with invalid values, I wonder if there is any
> way I can ask the Cast API to return NULL for invalid values, so that it is
> easier to process these NULL values later.
>
> And since it is rarely possible to guarantee all string values in an array
> are valid, **any** invalid value in an array/entire data set will make the
> cast process failed. This requires users using the cast API to figure out
> which value in the array has the invalid value by themself, which is not
> easy to do programmatically (only an error status message is set in the
> context). IMHO the following strategy could be a better default strategy
> when casting from string to number/boolean:
> 1) when finding an invalid value, set NULL as its value
> 2) set an error status indicating this array casting has some invalid
> values
> 3) keep finish casting the remaining elements in the array
> But I believe there are users who prefer bailing out as soon as possible
> as well, it will be great if we can provide different cast options to make
> both strategies possible.
>
> Thanks so much.
>
> Regards,
> Yue
>

Mime
View raw message