arrow-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yue Ni <niyue....@gmail.com>
Subject Re: Cast string array to number/boolean with invalid values
Date Sun, 31 May 2020 14:12:43 GMT
Thanks Neal and Wes. https://issues.apache.org/jira/browse/ARROW-1489 is
exactly what I am searching for.

On Sat, May 30, 2020 at 11:02 PM Wes McKinney <wesmckinn@gmail.com> wrote:

> It's https://issues.apache.org/jira/browse/ARROW-1489
>
> On Sat, May 30, 2020 at 9:56 AM Neal Richardson
> <neal.p.richardson@gmail.com> wrote:
> >
> > Sounds reasonable, could you please open a JIRA issue?
> >
> > Neal
> >
> > On Sat, May 30, 2020 at 1:01 AM Yue Ni <niyue.com@gmail.com> wrote:
> >>
> >> Hi there,
> >>
> >> I find arrow compute provides Cast API allowing users to cast from
> string to number/boolean values, but sometimes the string values contain
> some invalid values that cannot be casted to a number/boolean (sorry, data
> is really messy), for example, in a string array like ["1", "2", "3",
> "None", ""]. I wonder if there is any way to handle those invalid values
> during casting.
> >>
> >> Currently from the code I read (cast.h/cast.cc), it seems the cast will
> fail and return when dealing with invalid values, I wonder if there is any
> way I can ask the Cast API to return NULL for invalid values, so that it is
> easier to process these NULL values later.
> >>
> >> And since it is rarely possible to guarantee all string values in an
> array are valid, **any** invalid value in an array/entire data set will
> make the cast process failed. This requires users using the cast API to
> figure out which value in the array has the invalid value by themself,
> which is not easy to do programmatically (only an error status message is
> set in the context). IMHO the following strategy could be a better default
> strategy when casting from string to number/boolean:
> >> 1) when finding an invalid value, set NULL as its value
> >> 2) set an error status indicating this array casting has some invalid
> values
> >> 3) keep finish casting the remaining elements in the array
> >> But I believe there are users who prefer bailing out as soon as
> possible as well, it will be great if we can provide different cast options
> to make both strategies possible.
> >>
> >> Thanks so much.
> >>
> >> Regards,
> >> Yue
>

Mime
View raw message