crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alex Kozlov (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CRUNCH-310) There should be a way to specify projection schema for Parquet files
Date Wed, 11 Dec 2013 05:56:07 GMT

    [ https://issues.apache.org/jira/browse/CRUNCH-310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13845109#comment-13845109
] 

Alex Kozlov commented on CRUNCH-310:
------------------------------------

This will definitely work for me.  Now we need to add an option to add parquet filter(s)!


> There should be a way to specify projection schema for Parquet files
> --------------------------------------------------------------------
>
>                 Key: CRUNCH-310
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-310
>             Project: Crunch
>          Issue Type: Improvement
>          Components: IO
>            Reporter: Alex Kozlov
>            Priority: Critical
>         Attachments: 0001-CRUNCH-310-A-fix-for-projected-schemas.txt, CRUNCH-310.patch
>
>
> Currently the projection schema is set based on the ptype:
> {code}
>  private static <S> FormatBundle<AvroParquetInputFormat> getBundle(AvroType<S>
ptype) {
>     return FormatBundle.forInput(AvroParquetInputFormat.class)
>         .set(AvroReadSupport.AVRO_REQUESTED_PROJECTION, ptype.getSchema().toString())
>         // ParquetRecordReader expects ParquetInputSplits, not FileSplits, so it
>         // doesn't work with CombineFileInputFormat
>         .set(RuntimeParameters.DISABLE_COMBINE_FILE, "true");
>   }
> {code}
> Sometimes a user wants a subset of columns as a projection.  Need a mechanism to supply
desired projection schema.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

Mime
View raw message