beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vincent Spiewak (JIRA)" <>
Subject [jira] [Created] (BEAM-2840) BigQueryIO write is slow/fail with a bounded source
Date Tue, 05 Sep 2017 09:28:00 GMT
Vincent Spiewak created BEAM-2840:

             Summary: BigQueryIO write is slow/fail with a bounded source
                 Key: BEAM-2840
             Project: Beam
          Issue Type: Bug
          Components: sdk-java-gcp
    Affects Versions: 2.0.0
         Environment: Gougle Cloud Platform
            Reporter: Vincent Spiewak
            Assignee: Chamikara Jayalath
         Attachments: Capture d’écran 2017-09-05 à 11.15.40.png

BigQueryIO Writer is slow / fail if the input source is bounded.

If the input source is bounded (GCS / BQ select / ...), BigQueryIO Writer use the "[Method.FILE_LOADS|]"
instead of streaming inserts.

Large amounts of input datas result in a  java.lang.OutOfMemoryError / Java heap space (500
millions rows).

We cannot use "Method.STREAMING_INSERTS" or control the batchs sizes since
is private :(

Someone reported a similar problem with GCS -> BQ on Stackoverflow: 
[Why is writing to BigQuery from a Dataflow/Beam pipeline slow?|]

This message was sent by Atlassian JIRA

View raw message