pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Koji Noguchi (Jira)" <j...@apache.org>
Subject [jira] [Commented] (PIG-5319) Investigate why TestStoreInstances fails with Spark 2.2
Date Tue, 15 Jun 2021 19:31:00 GMT

    [ https://issues.apache.org/jira/browse/PIG-5319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17363875#comment-17363875
] 

Koji Noguchi commented on PIG-5319:
-----------------------------------

I do see OutputFormat created twice (*** below)
 Using Spark-2.4
{code:java|title=SparkHadoopWriter.scala}
117     committer.setupTask(taskContext).  ***
118
119     // Initiate the writer.
120     config.initWriter(taskContext, sparkPartitionId) ***
{code}
Within setupTask and initWriter, each is creating a separate OutputFormat.

Trace for each.
{noformat}
SparkHadoopWriter.scala:117     committer.setupTask(taskContext)
--> HadoopMapReduceCommitProtocol.scala:217 setupCommitter(taskContext)
-->   --> HadoopMapReduceCommitProtocol.scala:94     val format = context.getOutputFormatClass.newInstance()

{noformat}
and
{noformat}
SparkHadoopWriter.scala:120     config.initWriter(taskContext, sparkPartitionId)
--> SparkHadoopWriter.scala:343     val taskFormat = getOutputFormat()
--> --> SparkHadoopWriter.scala:384     outputFormat.newInstance()
{noformat}
 

> Investigate why TestStoreInstances fails with Spark 2.2
> -------------------------------------------------------
>
>                 Key: PIG-5319
>                 URL: https://issues.apache.org/jira/browse/PIG-5319
>             Project: Pig
>          Issue Type: Bug
>          Components: spark
>            Reporter: Nándor Kollár
>            Priority: Major
>
> TestStoreInstances unit test fails with Spark 2.2.x. It seems in job and task commit
logic changed a lot since Spark 2.1.x, now it looks like Spark uses a different PigOutputFormat
when writing to files, and a different one when getting the OutputCommitters



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Mime
View raw message