Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 2CDAB200CEB for ; Sat, 29 Jul 2017 00:15:14 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 2B55E16D9B1; Fri, 28 Jul 2017 22:15:14 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 4D30716D9B0 for ; Sat, 29 Jul 2017 00:15:13 +0200 (CEST) Received: (qmail 25674 invoked by uid 500); 28 Jul 2017 22:15:12 -0000 Mailing-List: contact issues-help@spark.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@spark.apache.org Received: (qmail 25665 invoked by uid 99); 28 Jul 2017 22:15:12 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 28 Jul 2017 22:15:12 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 0F3E3C3619 for ; Fri, 28 Jul 2017 22:15:12 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -99.201 X-Spam-Level: X-Spam-Status: No, score=-99.201 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id Rsndb6w70vk7 for ; Fri, 28 Jul 2017 22:15:03 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id 55B805FC12 for ; Fri, 28 Jul 2017 22:15:02 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 719F2E0DD9 for ; Fri, 28 Jul 2017 22:15:01 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id C5D4324652 for ; Fri, 28 Jul 2017 22:15:00 +0000 (UTC) Date: Fri, 28 Jul 2017 22:15:00 +0000 (UTC) From: "Mridul Muralidharan (JIRA)" To: issues@spark.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Comment Edited] (SPARK-21549) Spark fails to complete job correctly in case of OutputFormat which do not write into hdfs MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Fri, 28 Jul 2017 22:15:14 -0000 [ https://issues.apache.org/jira/browse/SPARK-21549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16105630#comment-16105630 ] Mridul Muralidharan edited comment on SPARK-21549 at 7/28/17 10:14 PM: ----------------------------------------------------------------------- This affects both mapred ("mapred.output.dir") and mapreduce ("mapreduce.output.fileoutputformat.outputdir") based OutputFormat's which do not set the properties referenced and is an incompatibility introduced in spark 2.2 Workaround is to explicitly set the property to a dummy value (which is valid and writable by user). was (Author: mridulm80): This affects both mapred ("mapred.output.dir") and mapreduce ("mapreduce.output.fileoutputformat.outputdir") based OutputFormat's which do not set the properties referenced and is an incompatibility introduced in spark 2.2 Workaround is to explicitly set the property to a dummy value. > Spark fails to complete job correctly in case of OutputFormat which do not write into hdfs > ------------------------------------------------------------------------------------------ > > Key: SPARK-21549 > URL: https://issues.apache.org/jira/browse/SPARK-21549 > Project: Spark > Issue Type: Bug > Components: Spark Core > Affects Versions: 2.2.0 > Environment: spark 2.2.0 > scala 2.11 > Reporter: Sergey Zhemzhitsky > > Spark fails to complete job correctly in case of custom OutputFormat implementations. > There are OutputFormat implementations which do not need to use *mapreduce.output.fileoutputformat.outputdir* standard hadoop property. > [But spark reads this property from the configuration|https://github.com/apache/spark/blob/v2.2.0/core/src/main/scala/org/apache/spark/internal/io/SparkHadoopMapReduceWriter.scala#L79] while setting up an OutputCommitter > {code:javascript} > val committer = FileCommitProtocol.instantiate( > className = classOf[HadoopMapReduceCommitProtocol].getName, > jobId = stageId.toString, > outputPath = conf.value.get("mapreduce.output.fileoutputformat.outputdir"), > isAppend = false).asInstanceOf[HadoopMapReduceCommitProtocol] > committer.setupJob(jobContext) > {code} > ... and then uses this property later on while [commiting the job|https://github.com/apache/spark/blob/v2.2.0/core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala#L132], [aborting the job|https://github.com/apache/spark/blob/v2.2.0/core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala#L141], [creating task's temp path|https://github.com/apache/spark/blob/v2.2.0/core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala#L95] > In that cases when the job completes then following exception is thrown > {code} > Can not create a Path from a null string > java.lang.IllegalArgumentException: Can not create a Path from a null string > at org.apache.hadoop.fs.Path.checkPathArg(Path.java:123) > at org.apache.hadoop.fs.Path.(Path.java:135) > at org.apache.hadoop.fs.Path.(Path.java:89) > at org.apache.spark.internal.io.HadoopMapReduceCommitProtocol.absPathStagingDir(HadoopMapReduceCommitProtocol.scala:58) > at org.apache.spark.internal.io.HadoopMapReduceCommitProtocol.abortJob(HadoopMapReduceCommitProtocol.scala:141) > at org.apache.spark.internal.io.SparkHadoopMapReduceWriter$.write(SparkHadoopMapReduceWriter.scala:106) > at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1.apply$mcV$sp(PairRDDFunctions.scala:1085) > at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1.apply(PairRDDFunctions.scala:1085) > at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1.apply(PairRDDFunctions.scala:1085) > at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) > at org.apache.spark.rdd.RDD.withScope(RDD.scala:362) > at org.apache.spark.rdd.PairRDDFunctions.saveAsNewAPIHadoopDataset(PairRDDFunctions.scala:1084) > ... > {code} > So it seems that all the jobs which use OutputFormats which don't write data into HDFS-compatible file systems are broken. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org For additional commands, e-mail: issues-help@spark.apache.org