crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "E. Sammer (JIRA)" <j...@apache.org>
Subject [jira] [Created] (CRUNCH-480) AvroParquetFileSource doesn't properly configure user-supplied read schema
Date Thu, 06 Nov 2014 07:35:34 GMT
E. Sammer created CRUNCH-480:
--------------------------------

             Summary: AvroParquetFileSource doesn't properly configure user-supplied read
schema
                 Key: CRUNCH-480
                 URL: https://issues.apache.org/jira/browse/CRUNCH-480
             Project: Crunch
          Issue Type: Bug
          Components: IO
    Affects Versions: 0.10.0
            Reporter: E. Sammer
            Priority: Blocker


It seems like AvroParquetFileSource doesn't properly set the configuration param required
to use a user-supplied read schema that differs from the schema in the file.

Deep in the guts of Parquet (InternalParquetReader#initialize()), I found this:
{code}
   this.recordConverter = readSupport.prepareForRead(
        configuration, extraMetadata, fileSchema,
        new ReadSupport.ReadContext(requestedSchema, readSupportMetadata));
{code}

Later, in Parquet's AvroReadSupport#prepareForRead(), it appears to ignore the supplied requestedSchema
and, instead, looks for the key avro.read.schema in the readSupportMetadata map. This is seriously
kookie code in Parquet (i.e. wrong), but because Crunch doesn't supply readSupportMetadata,
we can never properly supply a read schema. Boooo hisssss.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message