arrow-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wes McKinney <>
Subject Re: Cast string array to number/boolean with invalid values
Date Sat, 30 May 2020 15:01:45 GMT

On Sat, May 30, 2020 at 9:56 AM Neal Richardson
<> wrote:
> Sounds reasonable, could you please open a JIRA issue?
> Neal
> On Sat, May 30, 2020 at 1:01 AM Yue Ni <> wrote:
>> Hi there,
>> I find arrow compute provides Cast API allowing users to cast from string to number/boolean
values, but sometimes the string values contain some invalid values that cannot be casted
to a number/boolean (sorry, data is really messy), for example, in a string array like ["1",
"2", "3", "None", ""]. I wonder if there is any way to handle those invalid values during
>> Currently from the code I read (cast.h/, it seems the cast will fail and
return when dealing with invalid values, I wonder if there is any way I can ask the Cast API
to return NULL for invalid values, so that it is easier to process these NULL values later.
>> And since it is rarely possible to guarantee all string values in an array are valid,
**any** invalid value in an array/entire data set will make the cast process failed. This
requires users using the cast API to figure out which value in the array has the invalid value
by themself, which is not easy to do programmatically (only an error status message is set
in the context). IMHO the following strategy could be a better default strategy when casting
from string to number/boolean:
>> 1) when finding an invalid value, set NULL as its value
>> 2) set an error status indicating this array casting has some invalid values
>> 3) keep finish casting the remaining elements in the array
>> But I believe there are users who prefer bailing out as soon as possible as well,
it will be great if we can provide different cast options to make both strategies possible.
>> Thanks so much.
>> Regards,
>> Yue

View raw message