beam-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chamikara Jayalath <chamik...@google.com>
Subject Re: Request payload size exceeds the limit: 10485760 bytes
Date Thu, 11 Jan 2018 06:48:10 GMT
Dataflow service has a 10MB request size limit. Seems like you are hitting
this. See following for more information regarding this.
https://cloud.google.com/dataflow/pipelines/troubleshooting-your-pipeline

Looks like your are hitting this due to number of partitions. I don't think
currently there's a good solution other than to execute multiple jobs. We
hope to introduce dynamic destinations feature to Python BQ sink in the
near future which will allow you to write this using a more compact
pipeline.

Thanks,
Cham

On Wed, Jan 10, 2018 at 10:22 PM Unais Thachuparambil <
unais.thachuparambil@careem.com> wrote:

> I wrote a python dataflow job to read data from biqquery and do some
> transform and save the result as bq table..
>
> I tested with 8 days data it works fine - when I scaled to 180 days I’m
> getting the below error
>
> ```"message": "Request payload size exceeds the limit: 10485760 bytes.",```
>
>
> ```pitools.base.py.exceptions.HttpError: HttpError accessing <
> https://dataflow.googleapis.com/v1b3/projects/careem-mktg-dwh/locations/us-central1/jobs?alt=json>:
> response: <{'status': '400', 'content-length': '145', 'x-xss-protection':
> '1; mode=block', 'x-content-type-options': 'nosniff', 'transfer-encoding':
> 'chunked', 'vary': 'Origin, X-Origin, Referer', 'server': 'ESF',
> '-content-encoding': 'gzip', 'cache-control': 'private', 'date': 'Wed, 10
> Jan 2018 22:49:32 GMT', 'x-frame-options': 'SAMEORIGIN', 'alt-svc':
> 'hq=":443"; ma=2592000; quic=51303431; quic=51303339; quic=51303338;
> quic=51303337; quic=51303335,quic=":443"; ma=2592000; v="41,39,38,37,35"',
> 'content-type': 'application/json; charset=UTF-8'}>, content <{
> "error": {
> "code": 400,
> "message": "Request payload size exceeds the limit: 10485760 bytes.",
> "status": "INVALID_ARGUMENT"
> }
>
> ```
>
>
> In short, this is what I’m doing
> 1 - Reading data from bigquery table using
> ```beam.io.BigQuerySource ```
> 2 - Partitioning each days using
> ``` beam.Partition ```
> 3- Applying transforms each partition and combining some output
> P-Collections.
> 4- After the transforms, the results are saved to a biqquery date
> partitioned table.
>

Mime
View raw message