beam-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Punit Naik <naik.puni...@gmail.com>
Subject Re: Direct Runner Example
Date Wed, 11 May 2016 06:29:38 GMT
I tried to change the default DataFlow pipeline to a DirectRunner pipeline
by doing this:

DirectPipelineOptions options =
PipelineOptionsFactory.as(DirectPipelineOptions.class);
Pipeline p = Pipeline.create(options);
p.apply(TextIO.Read.named("ReadLines").from("file:///home/punit/factordb_setup.txt"))
     .apply(new CountWords())
     .apply(MapElements.via(new FormatAsTextFn()))

.apply(TextIO.Write.named("WriteCounts").to("file:///home/punit/beam-out"));
p.run();

But when I ran this it threw this error:

SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further
details.
Exception in thread "main" java.lang.IllegalStateException: Failed to
validate file:///home/user/file.txt
    at org.apache.beam.sdk.io.TextIO$Read$Bound.apply(TextIO.java:307)
    at org.apache.beam.sdk.io.TextIO$Read$Bound.apply(TextIO.java:204)
    at
org.apache.beam.sdk.runners.PipelineRunner.apply(PipelineRunner.java:76)
    at
org.apache.beam.sdk.runners.DirectPipelineRunner.apply(DirectPipelineRunner.java:253)
    at org.apache.beam.sdk.Pipeline.applyInternal(Pipeline.java:369)
    at org.apache.beam.sdk.Pipeline.applyTransform(Pipeline.java:276)
    at org.apache.beam.sdk.values.PBegin.apply(PBegin.java:48)
    at org.apache.beam.sdk.Pipeline.apply(Pipeline.java:158)
    at WordCount.main(WordCount.java:188)
Caused by: java.io.IOException: Unable to find handler for
file:///home/punit/factordb_setup.txt
    at
org.apache.beam.sdk.util.IOChannelUtils.getFactory(IOChannelUtils.java:187)
    at org.apache.beam.sdk.io.TextIO$Read$Bound.apply(TextIO.java:302)
    ... 8 more

How do I circumvent this problem?

On Wed, May 11, 2016 at 5:56 AM, Dan Halperin <dhalperi@google.com> wrote:

> +Max and Kenn explicitly
>
> It looks like the Flink pipeline runner currently depends on the
> DataflowPipelineOptions, which may add some complicated runtime
> dependencies.
>
>
> https://github.com/apache/incubator-beam/blob/master/runners/flink/runner/src/main/java/org/apache/beam/runners/flink/FlinkPipelineOptions.java#L21
>
> It seems conceivable that this could cause issues like this.
>
> Dan
>
> On Mon, May 9, 2016 at 12:51 PM, amir bahmanyari <amirtousa@yahoo.com>
> wrote:
>
>> "because currently it links in the original Dataflow runners by default"
>> Which jar(s) should be on the classpath to resolve "the original
>> Dataflow runners"?
>> In case of using FlinkPipelineRunner, I am getting this exception that
>> clearly references Dataflow runners but the Dataflow runners classes are
>> not fond.
>> Thanks for your help.
>>
>> Caused by: java.lang.NoClassDefFoundError:
>> org/apache/beam/runners/dataflow/*DataflowPipelineRunner*
>>         at org.apache.beam.runners.flink.FlinkPipelineRunner.fromOptions(
>> *FlinkPipelineRunner*.java:82)
>>         ... 20 more
>>
>>
>>
>> ------------------------------
>> *From:* Frances Perry <fjp@google.com>
>> *To:* user@beam.incubator.apache.org
>> *Sent:* Monday, May 9, 2016 7:32 AM
>> *Subject:* Re: Direct Runner Example
>>
>> The whole goal of Beam is that you won't need to change your pipeline
>> code to swap between runners. So like JB said, you should look in the
>> examples <https://github.com/apache/incubator-beam/tree/master/examples>
>> module. The idea is that you can use the --runner option to select from any
>> runner currently on your classpath. (Note that the Flink runner currently
>> has its own copy for legacy reasons -- we'll be removing that.)
>>
>> So for example, you can run with the direct runner like this:
>>
>>     $ mvn compile exec:java -pl examples/java
>> -Dexec.mainClass=org.apache.beam.examples.WordCount
>> -Dexec.args="--runner=DirectPipelineRunner --output=output"
>>
>> (We still need to fix the pom a bit to be runner-agnostic, because
>> currently it links in the original Dataflow runners by default.)
>>
>> You can also take a look at this Word Count Walkthrough
>> <https://cloud.google.com/dataflow/examples/wordcount-example> that
>> we'll be porting from Dataflow to Beam soon.
>>
>> Frances
>>
>>
>>
>> On Mon, May 9, 2016 at 4:36 AM, Jean-Baptiste Onofré <jb@nanthrax.net>
>> wrote:
>>
>> Hi
>>
>> You have a word count sample in the examples module.
>>
>> Regards
>> JB
>>
>>
>> -------- Original message --------
>> From: Punit Naik <naik.punit44@gmail.com>
>> Date: 09/05/2016 12:56 (GMT+01:00)
>> To: user@beam.incubator.apache.org
>> Subject: Direct Runner Example
>>
>> Can I get a wordcount direct runner example (batch)?
>>
>> --
>> Thank You
>>
>> Regards
>>
>> Punit Naik
>>
>>
>>
>>
>>
>


-- 
Thank You

Regards

Punit Naik

Mime
View raw message