sqoop-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Qian Xu (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SQOOP-1395) Use random generated class name for SqoopRecord
Date Tue, 02 Sep 2014 16:38:21 GMT

     [ https://issues.apache.org/jira/browse/SQOOP-1395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Qian Xu updated SQOOP-1395:
---------------------------
    Description: 
If you import a table "users". Sqoop will generate an entity class named "users.java". The
class will be compiled, submitted and used by a mapreduce job. If the target file format is
Avro or Parquet, an Avro schema will be generated as well. According to Avro specification,
the entity class is described as "record", the name of the "record" is "users".

For Parquet file format handling, we use the Kite SDK to manage Parquet file reading and writing
with minimal efforts. Kite requires an Avro schema and all data records to be packed into
GenericRecord instances. There will be a problem here. Kite will read the schema first and
try to instantiate a record regarding its name. In this case, Kite will try to instantiate
a "users" class. Unfortunately, there is a "users.java" out there. This will cause mapreduce
job fail. 

In order to solve this problem, I intend to keep the name of the entity class and the Avro
record different.

The patch will:

Change the record name in Avro schema.
Remove the SqoopAvroRecord, as it is no longer required. (ClassWriter.java is reverted to
previous state)

  was:
Sqoop will generate an entity class to hold values of every database record for mapreduce.
The class is inherited from the abstract class SqoopRecord. The name of the class is by default
the table name. 

When export records as Parquet files, the internal logic will attempt to instantiate another
entity class or create it on demand. Unfortunately, the target class has the same name of
the one Sqoop generated. 

The JIRA propose to use random class name to avoid the potential problem.


> Use random generated class name for SqoopRecord
> -----------------------------------------------
>
>                 Key: SQOOP-1395
>                 URL: https://issues.apache.org/jira/browse/SQOOP-1395
>             Project: Sqoop
>          Issue Type: Sub-task
>          Components: tools
>            Reporter: Qian Xu
>            Assignee: Qian Xu
>            Priority: Minor
>         Attachments: SQOOP-1395.2.patch, SQOOP-1395.patch
>
>
> If you import a table "users". Sqoop will generate an entity class named "users.java".
The class will be compiled, submitted and used by a mapreduce job. If the target file format
is Avro or Parquet, an Avro schema will be generated as well. According to Avro specification,
the entity class is described as "record", the name of the "record" is "users".
> For Parquet file format handling, we use the Kite SDK to manage Parquet file reading
and writing with minimal efforts. Kite requires an Avro schema and all data records to be
packed into GenericRecord instances. There will be a problem here. Kite will read the schema
first and try to instantiate a record regarding its name. In this case, Kite will try to instantiate
a "users" class. Unfortunately, there is a "users.java" out there. This will cause mapreduce
job fail. 
> In order to solve this problem, I intend to keep the name of the entity class and the
Avro record different.
> The patch will:
> Change the record name in Avro schema.
> Remove the SqoopAvroRecord, as it is no longer required. (ClassWriter.java is reverted
to previous state)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message