beam-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Davor Bonaci <da...@google.com>
Subject Re: TextIO with .withoutSharding() writing a only one shard of data and ignoring the rest
Date Wed, 01 Jun 2016 01:41:50 GMT
This will be a runner-specific issue. It would be the best to file a JIRA
issue for this.

On Tue, May 31, 2016 at 9:46 AM, Jean-Baptiste Onofré <jb@nanthrax.net>
wrote:

> Hi Pawel,
>
> does it happen only with the Flink runner ? I bet it happens with any
> runner.
>
> Let me take a look.
>
> Regards
> JB
>
> On 05/30/2016 01:38 AM, Pawel Szczur wrote:
>
>> Hi,
>>
>> I'm running a pipeline with Flink backend, Beam bleeding edge, Oracle
>> Java 1.8, maven 3.3.3, linux64.
>>
>> The pipeline is run with --parallelism=6.
>>
>> Adding .withoutSharding()causes a TextIO sink to write only one of the
>> shards.
>>
>> Example use:
>> data.apply(TextIO.Write.named("write-debug-csv").to("/tmp/some-stats"));
>> vs.
>>
>> data.apply(TextIO.Write.named("write-debug-csv").to("/tmp/some-stats")*.withoutSharding()*);
>>
>> Result:
>> Only part of data is written to file. After comparing to sharded output,
>> it seems to be just one of shard files.
>>
>> Cheers,
>> Pawel
>>
>
> --
> Jean-Baptiste Onofré
> jbonofre@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>

Mime
View raw message