arrow-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yue Ni <niyue....@gmail.com>
Subject Cast string array to number/boolean with invalid values
Date Sat, 30 May 2020 08:00:41 GMT
Hi there,

I find arrow compute provides Cast API allowing users to cast from string
to number/boolean values, but sometimes the string values contain some
invalid values that cannot be casted to a number/boolean (sorry, data is
really messy), for example, in a string array like ["1", "2", "3", "None",
""]. I wonder if there is any way to handle those invalid values during
casting.

Currently from the code I read (cast.h/cast.cc), it seems the cast will
fail and return when dealing with invalid values, I wonder if there is any
way I can ask the Cast API to return NULL for invalid values, so that it is
easier to process these NULL values later.

And since it is rarely possible to guarantee all string values in an array
are valid, **any** invalid value in an array/entire data set will make the
cast process failed. This requires users using the cast API to figure out
which value in the array has the invalid value by themself, which is not
easy to do programmatically (only an error status message is set in the
context). IMHO the following strategy could be a better default strategy
when casting from string to number/boolean:
1) when finding an invalid value, set NULL as its value
2) set an error status indicating this array casting has some invalid values
3) keep finish casting the remaining elements in the array
But I believe there are users who prefer bailing out as soon as possible as
well, it will be great if we can provide different cast options to make
both strategies possible.

Thanks so much.

Regards,
Yue

Mime
View raw message