beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (BEAM-2029) NullPointerException while evaluating GroupByKey in Spark Runner in streaming mode
Date Mon, 24 Apr 2017 09:23:04 GMT

    [ https://issues.apache.org/jira/browse/BEAM-2029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15980920#comment-15980920
] 

ASF GitHub Bot commented on BEAM-2029:
--------------------------------------

GitHub user echauchot opened a pull request:

    https://github.com/apache/beam/pull/2659

    [BEAM-2029] null pointer exception while evaluating group by key in spark runner in streaming
mode

    Be sure to do all of the following to help us incorporate your contribution
    quickly and easily:
    
     - [X] Make sure the PR title is formatted like:
       `[BEAM-<Jira issue #>] Description of pull request`
     - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
           Travis-CI on your fork and ensure the whole test matrix passes).
     - [X] Replace `<Jira issue #>` in the title with the actual Jira issue
           number, if there is one.
     - [X] If this contribution is large, please file an Apache
           [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
    
    ---
    R: @aviemzur 
    This PR, for now, just adds a test that raises a null pointer exception in spark runner
so it cannot be merged as it is now.  I don't know Spark runner well enough for now to do
the fix. But I suspect that the problem is due to the `withFanout()` support in spark runner.

    
    @aviemzur , can you take a  look at the test and do the fix? 
    As we need the fix to have the build pass and merge this PR, what do you think about forking
this PR and do the fix there, I will merge your PR and we will put the global PR (test + fix)
to review and take care that your commits do not get squashed?


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/echauchot/beam BEAM-2029-NullPointerException_while_evaluating_GroupByKey_in_Spark_Runner_in_streaming_mode

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/beam/pull/2659.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2659
    
----
commit b68f0c5c2b507db7b66ee451208c0e19071ad7fd
Author: Etienne Chauchot <echauchot@gmail.com>
Date:   2017-04-21T14:41:48Z

    [BEAM-2029] NullPointerException while evaluating GroupByKey in Spark Runner in streaming
mode
    Add test to reproduce the NullPointerException

----


> NullPointerException while evaluating GroupByKey in Spark Runner in streaming mode
> ----------------------------------------------------------------------------------
>
>                 Key: BEAM-2029
>                 URL: https://issues.apache.org/jira/browse/BEAM-2029
>             Project: Beam
>          Issue Type: Bug
>          Components: runner-spark
>            Reporter: Etienne Chauchot
>            Assignee: Etienne Chauchot
>
> To reproduce the bug, run Nexmark query7 (https://github.com/iemejia/beam/tree/BEAM-160-nexmark)

> Run main in {{org.apache.beam.integration.nexmark.drivers.NexmarkSparkDriver}}
> with VMOptions: {code} -Dspark.ui.enabled=false -DSPARK_LOCAL_IP=localhost -Dsun.io.serialization.extendedDebugInfo=true
{code}
> with Program arguments:
> {code}
> --query=7 --streamTimeout=1200 --streaming=true --numEventGenerators=4 --manageResources=false
--monitorJobs=true --enforceEncodability=false --enforceImmutability=false
> {code}
> Behavior:
> {{context.borrowDataset(transform)}} returns null.
> stackTrace
> {code}
> 17/04/20 15:00:58 INFO org.apache.beam.runners.spark.SparkRunner$Evaluator: Evaluating
GroupByKey
> Exception in thread "main" java.lang.NullPointerException
> 	at org.apache.beam.runners.spark.translation.streaming.StreamingTransformTranslator$6.evaluate(StreamingTransformTranslator.java:272)
> 	at org.apache.beam.runners.spark.translation.streaming.StreamingTransformTranslator$6.evaluate(StreamingTransformTranslator.java:267)
> 	at org.apache.beam.runners.spark.SparkRunner$Evaluator.doVisitTransform(SparkRunner.java:409)
> 	at org.apache.beam.runners.spark.SparkRunner$Evaluator.visitPrimitiveTransform(SparkRunner.java:395)
> 	at org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:488)
> 	at org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:483)
> 	at org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:483)
> 	at org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:483)
> 	at org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:483)
> 	at org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:483)
> 	at org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:483)
> 	at org.apache.beam.sdk.runners.TransformHierarchy$Node.access$400(TransformHierarchy.java:232)
> 	at org.apache.beam.sdk.runners.TransformHierarchy.visit(TransformHierarchy.java:207)
> 	at org.apache.beam.sdk.Pipeline.traverseTopologically(Pipeline.java:384)
> 	at org.apache.beam.runners.spark.translation.streaming.SparkRunnerStreamingContextFactory.call(SparkRunnerStreamingContextFactory.java:88)
> 	at org.apache.beam.runners.spark.translation.streaming.SparkRunnerStreamingContextFactory.call(SparkRunnerStreamingContextFactory.java:47)
> 	at org.apache.spark.streaming.api.java.JavaStreamingContext$$anonfun$10.apply(JavaStreamingContext.scala:776)
> 	at org.apache.spark.streaming.api.java.JavaStreamingContext$$anonfun$10.apply(JavaStreamingContext.scala:775)
> 	at scala.Option.getOrElse(Option.scala:120)
> 	at org.apache.spark.streaming.StreamingContext$.getOrCreate(StreamingContext.scala:864)
> 	at org.apache.spark.streaming.api.java.JavaStreamingContext$.getOrCreate(JavaStreamingContext.scala:775)
> 	at org.apache.spark.streaming.api.java.JavaStreamingContext.getOrCreate(JavaStreamingContext.scala)
> 	at org.apache.beam.runners.spark.SparkRunner.run(SparkRunner.java:155)
> 	at org.apache.beam.runners.spark.SparkRunner.run(SparkRunner.java:85)
> 	at org.apache.beam.sdk.Pipeline.run(Pipeline.java:276)
> 	at org.apache.beam.integration.nexmark.NexmarkRunner.run(NexmarkRunner.java:1232)
> 	at org.apache.beam.integration.nexmark.NexmarkDriver.runAll(NexmarkDriver.java:69)
> 	at org.apache.beam.integration.nexmark.drivers.NexmarkSparkDriver.main(NexmarkSparkDriver.java:46)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 	at java.lang.reflect.Method.invoke(Method.java:498)
> 	at com.intellij.rt.execution.application.AppMain.main(AppMain.java:147)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message