beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jesse Anderson (JIRA)" <>
Subject [jira] [Commented] (BEAM-645) Running Wordcount in Spark Checks Locally and Outputs in HDFS
Date Tue, 20 Sep 2016 18:56:20 GMT


Jesse Anderson commented on BEAM-645:

No, it doesn't happen with {{org.apache.beam.runners.spark.examples.WordCount}}.

I think the issue is subtly different. With Spark, you can define a default fs. You could
define it to be file: or hdfs: and that knowledge won't be transferred from the Spark runner
to the TextIO.

There more information on replicating here if you need it

> Running Wordcount in Spark Checks Locally and Outputs in HDFS
> -------------------------------------------------------------
>                 Key: BEAM-645
>                 URL:
>             Project: Beam
>          Issue Type: Bug
>          Components: runner-spark
>    Affects Versions: 0.3.0-incubating
>            Reporter: Jesse Anderson
>            Assignee: Amit Sela
> When running the Wordcount example with the Spark runner, the Spark runner uses the input
file in HDFS. When the program performs its startup checks, it looks for the file in the local
> To workaround this issue, you have to create a file in the local filesystem and put the
actual file in HDFS.
> Here is the stack trace when the file doesn't exist in the local filesystem:
> {quote}Exception in thread "main" java.lang.IllegalStateException: Unable to find any
files matching Macbeth.txt
> 	at
> 	at$Read$Bound.apply(
> 	at$Read$Bound.apply(
> 	at org.apache.beam.sdk.runners.PipelineRunner.apply(
> 	at org.apache.beam.runners.spark.SparkRunner.apply(
> 	at org.apache.beam.sdk.Pipeline.applyInternal(
> 	at org.apache.beam.sdk.Pipeline.applyTransform(
> 	at org.apache.beam.sdk.values.PBegin.apply(
> 	at org.apache.beam.sdk.Pipeline.apply(
> 	at org.apache.beam.examples.WordCount.main(
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(
> 	at java.lang.reflect.Method.invoke(
> 	at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
> 	at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
> 	at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
> 	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
> 	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> {quote}

This message was sent by Atlassian JIRA

View raw message