Return-Path: X-Original-To: apmail-sqoop-dev-archive@www.apache.org Delivered-To: apmail-sqoop-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 24320116AD for ; Mon, 1 Sep 2014 09:06:21 +0000 (UTC) Received: (qmail 39672 invoked by uid 500); 1 Sep 2014 09:06:21 -0000 Delivered-To: apmail-sqoop-dev-archive@sqoop.apache.org Received: (qmail 39630 invoked by uid 500); 1 Sep 2014 09:06:21 -0000 Mailing-List: contact dev-help@sqoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@sqoop.apache.org Delivered-To: mailing list dev@sqoop.apache.org Received: (qmail 39617 invoked by uid 99); 1 Sep 2014 09:06:20 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 01 Sep 2014 09:06:20 +0000 Date: Mon, 1 Sep 2014 09:06:20 +0000 (UTC) From: "Qian Xu (JIRA)" To: dev@sqoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Comment Edited] (SQOOP-1395) Use random generated class name for SqoopRecord MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/SQOOP-1395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14117221#comment-14117221 ] Qian Xu edited comment on SQOOP-1395 at 9/1/14 9:06 AM: -------------------------------------------------------- [~jarcec] There are actually two places that will use reflection to do class lookup. When a Kite's Dataset is being created, an Avro schema should be provided. In the schema, the type is actually the table name. Kite will try to verify schema. The writeSchema is the Avro schema. But the readerSchema will be the descent SqoopRecord entity class. {code} DataModelUtil.java public static DatumReader getDatumReaderForType(Class type, Schema writerSchema) { Schema readerSchema = getReaderSchema(type, writerSchema); GenericData dataModel = getDataModelForType(type); {code} When export Parquet files back to RDBMS, {{AvroIndexedRecordConverter}} will instantiate a class regarding the avroSchema. If the record type hits our entity class name, we will be unlucky. {code} AvroIndexedRecordConverter public AvroIndexedRecordConverter(ParentValueContainer parent, GroupType parquetSchema, Schema avroSchema) { this.specificClass = SpecificData.get().getClass(avroSchema); // ... } public void start() { // Should do the right thing whether it is generic or specific this.currentRecord = (T) ((this.specificClass == null) ? new GenericData.Record(avroSchema) : SpecificData.newInstance(specificClass, avroSchema)); } {code} was (Author: stanleyxu2005): [~jarcec] There are actually two places that will use reflection to do class lookup. When a Kite's Dataset is being created, an Avro schema should be provided. In the schema, the type is actually the table name. Kite will try to verify schema. The writeSchema is the Avro schema. But the readerSchema will be the descent SqoopRecord entity class. {{ DataModelUtil.java public static DatumReader getDatumReaderForType(Class type, Schema writerSchema) { Schema readerSchema = getReaderSchema(type, writerSchema); GenericData dataModel = getDataModelForType(type); }} When export Parquet files back to RDBMS, {{AvroIndexedRecordConverter}} will instantiate a class regarding the avroSchema. If the record type hits our entity class name, we will be unlucky. {{ AvroIndexedRecordConverter public AvroIndexedRecordConverter(ParentValueContainer parent, GroupType parquetSchema, Schema avroSchema) { this.specificClass = SpecificData.get().getClass(avroSchema); // ... } public void start() { // Should do the right thing whether it is generic or specific this.currentRecord = (T) ((this.specificClass == null) ? new GenericData.Record(avroSchema) : SpecificData.newInstance(specificClass, avroSchema)); } }} > Use random generated class name for SqoopRecord > ----------------------------------------------- > > Key: SQOOP-1395 > URL: https://issues.apache.org/jira/browse/SQOOP-1395 > Project: Sqoop > Issue Type: Sub-task > Components: tools > Reporter: Qian Xu > Assignee: Qian Xu > Priority: Minor > > Sqoop will generate an entity class to hold values of every database record for mapreduce. The class is inherited from the abstract class SqoopRecord. The name of the class is by default the table name. > When export records as Parquet files, the internal logic will attempt to instantiate another entity class or create it on demand. Unfortunately, the target class has the same name of the one Sqoop generated. > The JIRA propose to use random class name to avoid the potential problem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)