arrow-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sandy Ryza <sandyr...@gmail.com>
Subject Re: check whether pandas type is convertible to arrow type
Date Mon, 01 Jun 2020 15:45:37 GMT
Ah - I hadn't thought about how the object dtype complicates things:

What I'm trying to do at a higher level is maybe wacky:

   - I want a set of parquet files to be read/written by PySpark and Pandas
   interchangeably.
   - For each file, I want to to specify, in code, the column types
   expected in the file.
   - Before writing out a Pandas DataFrame to a file, I want to check
   whether it matches the expected column types for the file.  I don't need to
   provably catch every violation, but the more I can catch, the better.
   - I'm considering using pyarrow types for expressing the expected column
   types for each file.

Does that make sense?  Is there a different way you'd advise accomplishing
this?

On 2020/05/30 15:07:05, Wes McKinney <w...@gmail.com> wrote:
> I don't think there is specifically (one could be added in theory). Is>
> the goal to determine whether `pyarrow.array(pandas_object)` will>
> succeed or not, or something else? Since a lot of pandas data is>
> opaquely represented with object dtype it can be tricky unless you>
> want to go to the expense of using `pandas.lib.infer_dtype` to>
> determine the effective logical type of the values.>
>
> On Fri, May 29, 2020 at 4:18 PM Sandy Ryza <sa...@gmail.com> wrote:>
> >>
> > Hi all,>
> >>
> > If I have a pandas dtype and an arrow type, is there a pyarrow API that
allows me to check whether the pandas dtype is convertible to the arrow
type?>
> >>
> > It seems like "arrow_type.to_pandas_dtype() == pandas_dtype" would work
in most cases, because pandas dtypes tend to be at least as wide as
equivalent arrow types, but I'm wondering whether there's something more
principled.>
> >>
> > Any help much appreciated,>
> > Sandy>
> >>
>

Mime
View raw message