Return-Path: X-Original-To: apmail-crunch-dev-archive@www.apache.org Delivered-To: apmail-crunch-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 5B628C854 for ; Wed, 12 Nov 2014 19:47:35 +0000 (UTC) Received: (qmail 41268 invoked by uid 500); 12 Nov 2014 19:47:35 -0000 Delivered-To: apmail-crunch-dev-archive@crunch.apache.org Received: (qmail 41227 invoked by uid 500); 12 Nov 2014 19:47:35 -0000 Mailing-List: contact dev-help@crunch.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@crunch.apache.org Delivered-To: mailing list dev@crunch.apache.org Received: (qmail 41212 invoked by uid 500); 12 Nov 2014 19:47:35 -0000 Delivered-To: apmail-incubator-crunch-dev@incubator.apache.org Received: (qmail 41209 invoked by uid 99); 12 Nov 2014 19:47:35 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 12 Nov 2014 19:47:35 +0000 Date: Wed, 12 Nov 2014 19:47:35 +0000 (UTC) From: "Josh Wills (JIRA)" To: crunch-dev@incubator.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Resolved] (CRUNCH-480) AvroParquetFileSource doesn't properly configure user-supplied read schema MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CRUNCH-480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Wills resolved CRUNCH-480. ------------------------------- Resolution: Fixed Fix Version/s: 0.12.0 Committed. Thanks everyone! > AvroParquetFileSource doesn't properly configure user-supplied read schema > -------------------------------------------------------------------------- > > Key: CRUNCH-480 > URL: https://issues.apache.org/jira/browse/CRUNCH-480 > Project: Crunch > Issue Type: Bug > Components: IO > Affects Versions: 0.10.0 > Reporter: E. Sammer > Assignee: Gabriel Reid > Priority: Blocker > Fix For: 0.12.0 > > Attachments: CRUNCH-480.1.patch, CRUNCH-480.2.patch, CRUNCH-480.3.patch, CRUNCH-480.patch > > > It seems like AvroParquetFileSource doesn't properly set the configuration param required to use a user-supplied read schema that differs from the schema in the file. > Deep in the guts of Parquet (InternalParquetReader#initialize()), I found this: > {code} > this.recordConverter = readSupport.prepareForRead( > configuration, extraMetadata, fileSchema, > new ReadSupport.ReadContext(requestedSchema, readSupportMetadata)); > {code} > Later, in Parquet's AvroReadSupport#prepareForRead(), it appears to ignore the supplied requestedSchema and, instead, looks for the key avro.read.schema in the readSupportMetadata map. This is seriously kookie code in Parquet (i.e. wrong), but because Crunch doesn't supply readSupportMetadata, we can never properly supply a read schema. Boooo hisssss. -- This message was sent by Atlassian JIRA (v6.3.4#6332)