arrow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Miki Tebeka (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ARROW-539) [Python] Support reading Parquet datasets with standard partition directory schemes
Date Tue, 14 Mar 2017 19:11:41 GMT

    [ https://issues.apache.org/jira/browse/ARROW-539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15924820#comment-15924820
] 

Miki Tebeka commented on ARROW-539:
-----------------------------------

We can either do it in the arrow level and return a table with extra fields generated from
the directory structure or we can do it in the Pandas level, read only the value from the
parquet files and then generate columns for the DataFrame from the directory structure.

Which is better?

> [Python] Support reading Parquet datasets with standard partition directory schemes
> -----------------------------------------------------------------------------------
>
>                 Key: ARROW-539
>                 URL: https://issues.apache.org/jira/browse/ARROW-539
>             Project: Apache Arrow
>          Issue Type: New Feature
>          Components: Python
>            Reporter: Wes McKinney
>         Attachments: partitioned_parquet.tar.gz
>
>
> Currently, we only support multi-file directories with a flat structure (non-partitioned).




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message