Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id C642C200BFB for ; Wed, 11 Jan 2017 13:44:00 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id C5153160B2E; Wed, 11 Jan 2017 12:44:00 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id ED363160B54 for ; Wed, 11 Jan 2017 13:43:59 +0100 (CET) Received: (qmail 53788 invoked by uid 500); 11 Jan 2017 12:43:59 -0000 Mailing-List: contact dev-help@sqoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@sqoop.apache.org Delivered-To: mailing list dev@sqoop.apache.org Received: (qmail 53572 invoked by uid 99); 11 Jan 2017 12:43:59 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 11 Jan 2017 12:43:59 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id D28132C03DD for ; Wed, 11 Jan 2017 12:43:58 +0000 (UTC) Date: Wed, 11 Jan 2017 12:43:58 +0000 (UTC) From: "Hudson (JIRA)" To: dev@sqoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (SQOOP-3068) Enhance error (tool.ImportTool: Encountered IOException running import job: java.io.IOException: Expected schema) to suggest workaround (--map-column-java) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Wed, 11 Jan 2017 12:44:00 -0000 [ https://issues.apache.org/jira/browse/SQOOP-3068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15818208#comment-15818208 ] Hudson commented on SQOOP-3068: ------------------------------- SUCCESS: Integrated in Jenkins build Sqoop-hadoop20 #1085 (See [https://builds.apache.org/job/Sqoop-hadoop20/1085/]) SQOOP-3068: Enhance error (tool.ImportTool: Encountered IOException (maugli: [https://git-wip-us.apache.org/repos/asf?p=sqoop.git&a=commit&h=c0b9bc435396600b7b8798190c67ce77aa81a02c]) * (edit) src/java/org/apache/sqoop/mapreduce/ParquetJob.java * (edit) src/test/org/apache/sqoop/tool/TestImportTool.java * (add) src/java/org/apache/sqoop/avro/AvroSchemaMismatchException.java * (edit) src/java/org/apache/sqoop/tool/ImportTool.java * (add) src/test/org/apache/sqoop/util/ExpectedLogMessage.java * (edit) src/test/com/cloudera/sqoop/hive/TestHiveImport.java > Enhance error (tool.ImportTool: Encountered IOException running import job: java.io.IOException: Expected schema) to suggest workaround (--map-column-java) > ----------------------------------------------------------------------------------------------------------------------------------------------------------- > > Key: SQOOP-3068 > URL: https://issues.apache.org/jira/browse/SQOOP-3068 > Project: Sqoop > Issue Type: Improvement > Components: hive-integration > Affects Versions: 1.4.6 > Reporter: Markus Kemper > Assignee: Szabolcs Vasas > Attachments: SQOOP-3068.patch, SQOOP-3068.patch, SQOOP-3068.patch, SQOOP-3068.patch > > > Please consider enhancing the error to include more detail and suggest workaround (--map-columns-java). > Sqoop (import + --hive-import + --as-parquetfile) can fail due to a mismatch with the json schema that Hive produces vs. the json schema that Sqoop generates. The test case below demonstrates how to reproduce the issue as well as workaround it. > SETUP (create parquet table with Sqoop import and Beeline CTAS) > {noformat} > STEP 01 - Create MySQL Tables > sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "drop table t1" > sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "create table t1 (c_int int, c_date date, c_timestamp timestamp)" > sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "describe t1" > --------------------------------------------------------------------------------------------------------- > | Field | Type | Null | Key | Default | Extra | > --------------------------------------------------------------------------------------------------------- > | c_int | int(11) | YES | | (null) | | > | c_date | date | YES | | (null) | | > | c_timestamp | timestamp | NO | | CURRENT_TIMESTAMP | on update CURRENT_TIMESTAMP | > --------------------------------------------------------------------------------------------------------- > STEP 02 : Insert and Select Row > sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "insert into t1 values (1, current_date(), current_timestamp())" > sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query "select * from t1" > -------------------------------------------------- > | c_int | c_date | c_timestamp | > -------------------------------------------------- > | 1 | 2016-10-26 | 2016-10-26 14:30:33.0 | > -------------------------------------------------- > STEP 03 : Create Hive Tables > beeline -u jdbc:hive2:// -e "use default; drop table t1" > sqoop import -Dmapreduce.map.log.level=DEBUG --connect $MYCONN --username $MYUSER --password $MYPSWD --table t1 --hcatalog-database default --hcatalog-table t1_text --create-hcatalog-table --hcatalog-storage-stanza 'stored as parquet' --num-mappers 1 > beeline -u jdbc:hive2:// -e "use default; create table t1 stored as parquet as select * from t1_text;show create table t1;" > +----------------------------------------------------+--+ > | createtab_stmt | > +----------------------------------------------------+--+ > | CREATE TABLE `t1`( | > | `c_int` int, | > | `c_date` string, | > | `c_timestamp` string) | > | ROW FORMAT SERDE | > | 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' | > | STORED AS INPUTFORMAT | > | 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' | > | OUTPUTFORMAT | > | 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' | > | LOCATION | > | 'hdfs://:8020/user/hive/warehouse/t1' | > | TBLPROPERTIES ( | > | 'COLUMN_STATS_ACCURATE'='true', | > | 'numFiles'='1', | > | 'numRows'='2', | > | 'rawDataSize'='6', | > | 'totalSize'='605', | > | 'transient_lastDdlTime'='1478298298') | > +----------------------------------------------------+--+ > {noformat} > REPRODUCE ISSUE (import --hive-import append and overwrite) > {noformat} > STEP 01: Attempt --hive-import --append > > sqoop import -Dmapreduce.map.log.level=DEBUG --connect $MYCONN --username $MYUSER --password $MYPSWD --table t1 --hive-import --hive-database default --hive-table t1 --as-parquetfile --num-mappers 1 --append > 16/11/02 20:54:23 WARN mapreduce.DataDrivenImportJob: Target Hive table 't1' exists! Sqoop will append data into the existing Hive table. Consider using --hive-overwrite, if you do NOT intend to do appending. > 16/11/02 20:54:24 ERROR tool.ImportTool: Encountered IOException running import job: java.io.IOException: Expected schema: {"type":"record","name":"t1","fields":[{"name":"c_int","type":["null","int"],"doc":"Converted from 'int'","default":null},{"name":"c_date","type":["null","string"],"doc":"Converted from 'string'","default":null},{"name":"c_timestamp","type":["null","string"],"doc":"Converted from 'string'","default":null}]} > Actual schema: {"type":"record","name":"t1","doc":"Sqoop import of t1","fields":[{"name":"c_int","type":["null","int"],"default":null,"columnName":"c_int","sqlType":"4"},{"name":"c_date","type":["null","long"],"default":null,"columnName":"c_date","sqlType":"91"},{"name":"c_timestamp","type":["null","long"],"default":null,"columnName":"c_timestamp","sqlType":"93"}],"tableName":"t1"} > STEP 02: Attempt --hive-import --hive-overwrite > sqoop import -Dmapreduce.map.log.level=DEBUG --connect $MYCONN --username $MYUSER --password $MYPSWD --table t1 --hive-import --hive-database default --hive-table t1 --as-parquetfile --num-mappers 1 --hive-overwrite > 16/11/02 20:56:55 INFO hive.metastore: Connected to metastore. > 16/11/02 20:56:56 ERROR tool.ImportTool: Encountered IOException running import job: java.io.IOException: Expected schema: {"type":"record","name":"t1","fields":[{"name":"c_int","type":["null","int"],"doc":"Converted from 'int'","default":null},{"name":"c_date","type":["null","string"],"doc":"Converted from 'string'","default":null},{"name":"c_timestamp","type":["null","string"],"doc":"Converted from 'string'","default":null}]} > Actual schema: {"type":"record","name":"t1","doc":"Sqoop import of t1","fields":[{"name":"c_int","type":["null","int"],"default":null,"columnName":"c_int","sqlType":"4"},{"name":"c_date","type":["null","long"],"default":null,"columnName":"c_date","sqlType":"91"},{"name":"c_timestamp","type":["null","long"],"default":null,"columnName":"c_timestamp","sqlType":"93"}],"tableName":"t1"} > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)