arrow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wes McKinney <wesmck...@gmail.com>
Subject Re: Pandas timestamp
Date Tue, 25 Apr 2017 18:52:37 GMT
hi Bryan,

You will want to create DataFrame objects having datetime64[ns] columns.
There are some examples in the pyarrow test suite:

https://github.com/apache/arrow/blob/master/python/pyarrow/tests/test_convert_pandas.py#L324

You can convert an array of datetime.datetime objects to datetime64[ns]
dtype with pandas.to_datetime

In [15]: df = pd.DataFrame(data)

In [16]: df['timestamp_t'] = pd.to_datetime(df.timestamp_t)

In [17]: df.dtypes
Out[17]:
timestamp_t    datetime64[ns]
dtype: object

pd.to_datetime does not seem to work with the NaiveTZ object here (if Jeff
Reback is reading, maybe he can explain why); why do you need that for
tz-naive data? If that's something we absolutely need fixed in pandas, we
should try to do it right away since the 0.20 rc is pending right now.

- Wes

On Tue, Apr 25, 2017 at 1:38 PM, Bryan Cutler <cutlerb@gmail.com> wrote:

> I am writing a unit test to compare that a Pandas DataFrame made by Arrow
> is equal to one constructed directly with data.  The timestamp values are a
> Python datetime object with a timezone tzinfo object.  When I compare the
> results, the values are equal but the schema is not.  Using arrow the type
> is "datetime64[ns]" and without it is "object."  Without a tzinfo, the
> types match but I do need it there for the conversion with Arrow data.  I
> could just replace the tzinfo for the Pandas DataFrame, it is a naive
> timezone with utcoffset=None.  Does anyone know another way to produce
> compatible types?  I do need the data to be compatible with Spark too.
> Hopefully this makes sense, I could attach some code if that would help,
> thanks! Here is a sample of the data:
>
> class NaiveTZ(tzinfo):
>     def utcoffset(self, date_time):
>         return None
>
>     def dst(self, date_time):
>         return None
>
> data = {"timestamp_t": [datetime(2011, 1, 1, 1, 1, 1, tzinfo=NaiveTZ())]}
>
> pd.DataFrame(data)
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message