arrow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wes McKinney (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ARROW-376) Python: Convert non-range Pandas indices (optionally) to Arrow
Date Thu, 02 Mar 2017 23:16:45 GMT

    [ https://issues.apache.org/jira/browse/ARROW-376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15893223#comment-15893223
] 

Wes McKinney commented on ARROW-376:
------------------------------------

[~ahnj] if you don't mind, I will take care of this one. It requires a bit of work to expose
the {{custom_metadata}} fields in the file metadata

> Python: Convert non-range Pandas indices (optionally) to Arrow
> --------------------------------------------------------------
>
>                 Key: ARROW-376
>                 URL: https://issues.apache.org/jira/browse/ARROW-376
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: Python
>            Reporter: Uwe L. Korn
>            Assignee: Wes McKinney
>            Priority: Minor
>              Labels: newbie
>             Fix For: 0.3.0
>
>
> Currently the indices of a Pandas DataFrame are totally ignored on the Pandas to Arrow
conversion. We should add an option to also convert the index to an Arrow column if they are
not a simple range index.
> The condition for a simple index should be {{isinstance(df.index, pd.RangeIndex) &&
(df.index._start == 0) && (df.index._stop == len(df.index)) && (df.index._step
== 1)}}. In this case, we can always skip the index conversion. Otherwise, a new column in
the Arrow table shall be created using the index' name as the name of the column. Additionally
there should be some metadata annotation of that column that it is derived of an Pandas Index,
so that for roundtrips, we'll use it again as the index of a DataFrame.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message