arrow-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wes McKinney <wesmck...@gmail.com>
Subject Re: check whether pandas type is convertible to arrow type
Date Wed, 03 Jun 2020 17:14:24 GMT
You can specify an explicit Arrow schema when converting a
pandas.DataFrame to pyarrow.Table or RecordBatch. So it might be
better to write out the schema you want (kind of like when you write
the schema in SQL with CREATE TABLE ...) and then ensure that pandas
objects are coerced into that?

On Mon, Jun 1, 2020 at 10:45 AM Sandy Ryza <sandyryza@gmail.com> wrote:
>
> Ah - I hadn't thought about how the object dtype complicates things:
>
> What I'm trying to do at a higher level is maybe wacky:
>
> I want a set of parquet files to be read/written by PySpark and Pandas interchangeably.
> For each file, I want to to specify, in code, the column types expected in the file.
> Before writing out a Pandas DataFrame to a file, I want to check whether it matches the
expected column types for the file.  I don't need to provably catch every violation, but the
more I can catch, the better.
> I'm considering using pyarrow types for expressing the expected column types for each
file.
>
> Does that make sense?  Is there a different way you'd advise accomplishing this?
>
> On 2020/05/30 15:07:05, Wes McKinney <w...@gmail.com> wrote:
> > I don't think there is specifically (one could be added in theory). Is>
> > the goal to determine whether `pyarrow.array(pandas_object)` will>
> > succeed or not, or something else? Since a lot of pandas data is>
> > opaquely represented with object dtype it can be tricky unless you>
> > want to go to the expense of using `pandas.lib.infer_dtype` to>
> > determine the effective logical type of the values.>
> >
> > On Fri, May 29, 2020 at 4:18 PM Sandy Ryza <sa...@gmail.com> wrote:>
> > >>
> > > Hi all,>
> > >>
> > > If I have a pandas dtype and an arrow type, is there a pyarrow API that allows
me to check whether the pandas dtype is convertible to the arrow type?>
> > >>
> > > It seems like "arrow_type.to_pandas_dtype() == pandas_dtype" would work in
most cases, because pandas dtypes tend to be at least as wide as equivalent arrow types, but
I'm wondering whether there's something more principled.>
> > >>
> > > Any help much appreciated,>
> > > Sandy>
> > >>
> >

Mime
View raw message