beam-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Baptiste Onofré>
Subject [HEADS UP] Using "new" filesystem layer
Date Thu, 04 May 2017 15:58:27 GMT
Hi guys,

One of key refactoring/new feature we bring in the first stable release is the 
"new" Beam filesystems.

I started to play with it on couple of use cases I have in beam-samples.

1/ TextIO.write() with unbounded PCollection (stream)

The first use case is the TextIO write with unbounded PCollection (good timing 
as we had a question yesterday about this on Slack).

I confirm that TextIO now supports unbounded PCollection. You have to create a 
Window and "flag" TextIO to use windowing.

Here's the code snippet:

                 .apply(MapElements.via(new SimpleFunction<JmsRecord, String>() {
                     public String apply(JmsRecord input) {
                         return input.getPayload();

Thanks to Dan, I found an issue in the watermark of JmsIO (as it uses the JMS 
ack to advance the watermark, it should not be auto but client ack). I'm 
preparing a PR for JmsIO about this.
However the "windowed" TextIO works fine.

2/ Beam HDFS filesystem

The other use case is to use the "new" Beam filesystem with TextIO, especially HDFS.

So, in my pipeline, I'm using:


In my pom.xml, I define both Beam hadoop-file-system and hadoop-client dependencies:


Unfortunately, when starting the pipeline, I have:

Exception in thread "main" java.lang.IllegalStateException: Unable to find 
registrar for hdfs

I gonna investigate tonight and I will let you know.

Jean-Baptiste Onofré
Talend -

View raw message