Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 833BC200C31 for ; Wed, 8 Mar 2017 13:34:42 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 81CD4160B83; Wed, 8 Mar 2017 12:34:42 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id A58EC160B75 for ; Wed, 8 Mar 2017 13:34:41 +0100 (CET) Received: (qmail 26420 invoked by uid 500); 8 Mar 2017 12:34:40 -0000 Mailing-List: contact dev-help@sqoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@sqoop.apache.org Delivered-To: mailing list dev@sqoop.apache.org Received: (qmail 26409 invoked by uid 99); 8 Mar 2017 12:34:40 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 08 Mar 2017 12:34:40 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 43FADC32C7 for ; Wed, 8 Mar 2017 12:34:40 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0.651 X-Spam-Level: X-Spam-Status: No, score=0.651 tagged_above=-999 required=6.31 tests=[RP_MATCHES_RCVD=-0.001, SPF_NEUTRAL=0.652] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id edsxE0vXbXAo for ; Wed, 8 Mar 2017 12:34:39 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id EAFC55F27E for ; Wed, 8 Mar 2017 12:34:38 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 4EBB8E002C for ; Wed, 8 Mar 2017 12:34:38 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 0A02124347 for ; Wed, 8 Mar 2017 12:34:38 +0000 (UTC) Date: Wed, 8 Mar 2017 12:34:38 +0000 (UTC) From: "Ahmed Kamal (JIRA)" To: dev@sqoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (SQOOP-3147) Import data to Hive Table in S3 in Parquet format MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Wed, 08 Mar 2017 12:34:42 -0000 [ https://issues.apache.org/jira/browse/SQOOP-3147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmed Kamal updated SQOOP-3147: ------------------------------- Summary: Import data to Hive Table in S3 in Parquet format (was: Import data from MySQL To Hive Table in S3 in Parquet format) > Import data to Hive Table in S3 in Parquet format > ------------------------------------------------- > > Key: SQOOP-3147 > URL: https://issues.apache.org/jira/browse/SQOOP-3147 > Project: Sqoop > Issue Type: Bug > Affects Versions: 1.4.6 > Reporter: Ahmed Kamal > > Using this command succeeds only if the Hive Table's location is HDFS. If the table is backed by S3 it throws an exception while trying to move the data from HDFS tmp directory to S3 > Job job_1486539699686_3090 failed with state FAILED due to: Job commit failed: org.kitesdk.data.DatasetIOException: Dataset merge failed > at org.kitesdk.data.spi.filesystem.FileSystemDataset.merge(FileSystemDataset.java:333) > at org.kitesdk.data.spi.filesystem.FileSystemDataset.merge(FileSystemDataset.java:56) > at org.kitesdk.data.mapreduce.DatasetKeyOutputFormat$MergeOutputCommitter.commitJob(DatasetKeyOutputFormat.java:370) > at org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.handleJobCommit(CommitterEventHandler.java:285) > at org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.run(CommitterEventHandler.java:237) > at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.io.IOException: Dataset merge failed during rename of hdfs://hdfs-path/tmp/dev_kamal/.temp/job_1486539699686_3090/mr/job_1486539699686_3090/0192f987-bd4c-4cb7-836f-562ac483e008.parquet to s3://bucket_name/dev_kamal/address/0192f987-bd4c-4cb7-836f-562ac483e008.parquet > at org.kitesdk.data.spi.filesystem.FileSystemDataset.merge(FileSystemDataset.java:329) > ... 7 more > sqoop import --connect "jdbc:mysql://connectionUrl" --table "tableName" --as-parquetfile --verbose --username=uname --password=pass --hive-import --delete-target-dir --hive-database dev_kamal --hive-table customer_car_type --hive-overwrite -m 150 > Another issue that I noticed is that Sqoop loads the Avro schema in TBLProperties under avro.schema.literal attribute and if the table has a lot of columns, the schema would be truncated and this would cause a weird exception like this one. > *Exception :* > 17/03/07 12:13:13 INFO hive.metastore: Trying to connect to metastore with URI thrift://ip-10-0-0-47.eu-west-1.compute.internal:9083 > 17/03/07 12:13:13 INFO hive.metastore: Opened a connection to metastore, current connections: 1 > 17/03/07 12:13:13 INFO hive.metastore: Connected to metastore. > 17/03/07 12:13:17 DEBUG util.ClassLoaderStack: Restoring classloader: sun.misc.Launcher$AppClassLoader@3e9b1010 > 17/03/07 12:13:17 ERROR sqoop.Sqoop: Got exception running Sqoop: org.apache.avro.SchemaParseException: org.codehaus.jackson.JsonParseException: Unexpected end-of-input: was expecting closing quote for a string value > at [Source: java.io.StringReader@3fb42ec7; line: 1, column: 6001] > org.apache.avro.SchemaParseException: org.codehaus.jackson.JsonParseException: Unexpected end-of-input: was expecting closing quote for a string value > at [Source: java.io.StringReader@3fb42ec7; line: 1, column: 6001] > at org.apache.avro.Schema$Parser.parse(Schema.java:929) > at org.apache.avro.Schema$Parser.parse(Schema.java:917) > at org.kitesdk.data.DatasetDescriptor$Builder.schemaLiteral(DatasetDescriptor.java:475) > at org.kitesdk.data.spi.hive.HiveUtils.descriptorForTable(HiveUtils.java:154) > at org.kitesdk.data.spi.hive.HiveAbstractMetadataProvider.load(HiveAbstractMetadataProvider.java:104) > at org.kitesdk.data.spi.filesystem.FileSystemDatasetRepository.load(FileSystemDatasetRepository.java:192) > at org.kitesdk.data.Datasets.load(Datasets.java:108) > at org.kitesdk.data.Datasets.load(Datasets.java:165) > at org.kitesdk.data.Datasets.load(Datasets.java:187) > at org.apache.sqoop.mapreduce.ParquetJob.configureImportJob(ParquetJob.java:78) > at org.apache.sqoop.mapreduce.DataDrivenImportJob.configureMapper(DataDrivenImportJob.java:108) > at org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:260) > at org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:673) > at org.apache.sqoop.manager.MySQLManager.importTable(MySQLManager.java:118) > at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:497) > at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605) > at org.apache.sqoop.Sqoop.run(Sqoop.java:143) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179) > at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218) > at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227) > at org.apache.sqoop.Sqoop.main(Sqoop.java:236) > Caused by: org.codehaus.jackson.JsonParseException: Unexpected end-of-input: was expecting closing quote for a string value > at [Source: java.io.StringReader@3fb42ec7; line: 1, column: 6001] > at org.codehaus.jackson.JsonParser._constructError(JsonParser.java:1433) > at org.codehaus.jackson.impl.JsonParserMinimalBase._reportError(JsonParserMinimalBase.java:521) > at org.codehaus.jackson.impl.JsonParserMinimalBase._reportInvalidEOF(JsonParserMinimalBase.java:454) > at org.codehaus.jackson.impl.ReaderBasedParser._finishString2(ReaderBasedParser.java:1342) > at org.codehaus.jackson.impl.ReaderBasedParser._finishString(ReaderBasedParser.java:1330) > at org.codehaus.jackson.impl.ReaderBasedParser.getText(ReaderBasedParser.java:200) > at org.codehaus.jackson.map.deser.std.BaseNodeDeserializer.deserializeObject(JsonNodeDeserializer.java:203) > at org.codehaus.jackson.map.deser.std.BaseNodeDeserializer.deserializeArray(JsonNodeDeserializer.java:224) > at org.codehaus.jackson.map.deser.std.BaseNodeDeserializer.deserializeObject(JsonNodeDeserializer.java:200) > at org.codehaus.jackson.map.deser.std.JsonNodeDeserializer.deserialize(JsonNodeDeserializer.java:58) > at org.codehaus.jackson.map.deser.std.JsonNodeDeserializer.deserialize(JsonNodeDeserializer.java:15) > at org.codehaus.jackson.map.ObjectMapper._readValue(ObjectMapper.java:2704) > at org.codehaus.jackson.map.ObjectMapper.readTree(ObjectMapper.java:1344) > at org.apache.avro.Schema$Parser.parse(Schema.java:927) > ... 21 more -- This message was sent by Atlassian JIRA (v6.3.15#6346)