crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alex Kozlov (JIRA)" <j...@apache.org>
Subject [jira] [Created] (CRUNCH-310) There should be a way to specify projection schema for Parquet files
Date Mon, 09 Dec 2013 23:52:07 GMT
Alex Kozlov created CRUNCH-310:
----------------------------------

             Summary: There should be a way to specify projection schema for Parquet files
                 Key: CRUNCH-310
                 URL: https://issues.apache.org/jira/browse/CRUNCH-310
             Project: Crunch
          Issue Type: Improvement
          Components: IO
            Reporter: Alex Kozlov
            Priority: Critical


Currently the projection schema is set based on the ptype:

{code}
 private static <S> FormatBundle<AvroParquetInputFormat> getBundle(AvroType<S>
ptype) {
    return FormatBundle.forInput(AvroParquetInputFormat.class)
        .set(AvroReadSupport.AVRO_REQUESTED_PROJECTION, ptype.getSchema().toString())
        // ParquetRecordReader expects ParquetInputSplits, not FileSplits, so it
        // doesn't work with CombineFileInputFormat
        .set(RuntimeParameters.DISABLE_COMBINE_FILE, "true");
  }
{code}

Sometimes a user wants a subset of columns as a projection.  Need a mechanism to supply desired
projection schema.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

Mime
View raw message