beam-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tanay Tummalapalli <ttanay...@gmail.com>
Subject Re: BQ IT tests fail on TestDataflowRunner - Python SDK
Date Tue, 04 Jun 2019 10:25:32 GMT
I didn't have any other changes.
I ran the tests with a clean virtualenv as you suggested and it works now.
:)

Thanks Ahmet and Chamikara!

On Tue, Jun 4, 2019 at 6:36 AM Chamikara Jayalath <chamikara@google.com>
wrote:

> Sounds like your input job was somehow incompatible with the Dataflow
> worker. Running using a clean virtual env should help verify as Ahmet
> mentioned.
>
> On Mon, Jun 3, 2019 at 5:44 PM Ahmet Altay <altay@google.com> wrote:
>
>> Do you have any other changes? Are you trying from head with a clean
>> virtual environment?
>>
>> If you can share a link to dataflow job (in the apache-beam-testing GCP
>> project), we can try to look at additional logs as well.
>>
>> On Mon, Jun 3, 2019 at 1:42 PM Tanay Tummalapalli <ttanay100@gmail.com>
>> wrote:
>>
>>> Hi everyone,
>>>
>>> I ran the Integration Tests -
>>> BigQueryStreamingInsertTransformIntegrationTests[1] and
>>> BigQueryFileLoadsIT[2] on the master branch locally, with the following
>>> command:
>>> ./scripts/run_integration_test.sh --test_opts
>>> --tests=apache_beam.io.gcp.bigquery_test:BigQueryStreamingInsertTransformIntegrationTests
>>> The Dataflow jobs for the tests failed with the following error:
>>> root: INFO: 2019-06-03T18:36:53.021Z: JOB_MESSAGE_ERROR: Traceback
>>> (most recent call last):
>>> File
>>> "/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py",
>>> line 649, in do_work
>>> work_executor.execute()
>>> File
>>> "/usr/local/lib/python2.7/dist-packages/dataflow_worker/executor.py",
>>> line 150, in execute
>>> test_shuffle_sink=self._test_shuffle_sink)
>>> File
>>> "/usr/local/lib/python2.7/dist-packages/dataflow_worker/executor.py",
>>> line 116, in create_operation
>>> is_streaming=False)
>>> File "apache_beam/runners/worker/operations.py", line 962, in
>>> apache_beam.runners.worker.operations.create_operation
>>> op = BatchGroupAlsoByWindowsOperation(
>>> File "dataflow_worker/shuffle_operations.py", line 219, in
>>> dataflow_worker.shuffle_operations.BatchGroupAlsoByWindowsOperation.
>>> __init__
>>> self.windowing = deserialize_windowing_strategy(self.spec.window_fn)
>>> File "dataflow_worker/shuffle_operations.py", line 207, in
>>> dataflow_worker.shuffle_operations.deserialize_windowing_strategy
>>> return pickler.loads(serialized_data)
>>> File
>>> "/usr/local/lib/python2.7/dist-packages/apache_beam/internal/pickler.py",
>>> line 248, in loads
>>> c = base64.b64decode(encoded)
>>> File "/usr/lib/python2.7/base64.py", line 78, in b64decode
>>> raise TypeError(msg)
>>> TypeError: Incorrect padding
>>>
>>>
>>> I tested the same tests on the 2.13.0-RC#2 branch as well and they
>>> passed. These tests also don't fail in the most recent Python post-commit
>>> tests[3-5].
>>>
>>> Keeping in mind the recent b64 changes in BQ, none of the tests in the
>>> test classes mentioned above makes use of a "BYTES" type field.
>>> Would love to get pointers to possible reasons.
>>>
>>> Thank You
>>> - TT
>>>
>>> [1]
>>> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery_test.py#L479-L630
>>> [2]
>>> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery_file_loads_test.py#L358-L528
>>> [3]
>>> https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/
>>> [4]
>>> https://builds.apache.org/job/beam_PostCommit_Python3_Verify/lastCompletedBuild/
>>> [5]
>>> https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/
>>>
>>

Mime
View raw message