beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tobias Feldhaus (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (BEAM-1997) Scaling Problem of Beam (size of the serialized JSON representation of the pipeline exceeds the allowable limit)
Date Tue, 18 Apr 2017 18:32:41 GMT

    [ https://issues.apache.org/jira/browse/BEAM-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15973245#comment-15973245
] 

Tobias Feldhaus edited comment on BEAM-1997 at 4/18/17 6:31 PM:
----------------------------------------------------------------

You are correct, I've posted the wrong screenshots, sorry. I will rerun it and post correct
ones. I did run it with the mentioned number of files though. I've uncompressed the gzip files
for the test runs in the end to save time. Nevertheless while doing that I will already move
out the {{ParseIntoJson}}.


was (Author: james-woods):
You are correct, I've posted the wrong screenshots, sorry. I will rerun it and post correct
ones. I did run it with the mentioned number of files though. Nevertheless while doing that
I will already move out the {{ParseIntoJson}}.

> Scaling Problem of Beam (size of the serialized JSON representation of the pipeline exceeds
the allowable limit)
> ----------------------------------------------------------------------------------------------------------------
>
>                 Key: BEAM-1997
>                 URL: https://issues.apache.org/jira/browse/BEAM-1997
>             Project: Beam
>          Issue Type: Bug
>          Components: runner-dataflow
>    Affects Versions: 0.6.0
>            Reporter: Tobias Feldhaus
>            Assignee: Daniel Halperin
>
> After switching from Dataflow SDK 1.9 to Apache Beam SDK 0.6 my pipeline does no longer
run with 180 output days (BigQuery partitions as sinks), but only 60 output days. If using
a larger number with Beam the response from the Cloud  Dataflow service reads as follows:
> {code}
> Failed to create a workflow job: The size of the serialized JSON representation of the
pipeline exceeds the allowable limit. For more information, please check the FAQ link below:
> {code}
> This is the pipeline in dataflow: https://gist.github.com/james-woods/f84b6784ee6d1b87b617f80f8c7dd59f
> The resulting graph in Dataflow looks like this: 
> https://puu.sh/vhWAW/a12f3246a1.png
> This is the same pipeline in beam: https://gist.github.com/james-woods/c4565db769bffff0494e0bef5e9c334c
> The constructed graph looks somewhat different:
> https://puu.sh/vhWvm/78a40d422d.png
> Methods used are taken from this example https://gist.github.com/dhalperi/4bbd13021dd5f9998250cff99b155db6



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message