beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Delfour (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (BEAM-3757) Shuffle read failed using python 2.2.0
Date Tue, 27 Feb 2018 21:05:00 GMT

     [ https://issues.apache.org/jira/browse/BEAM-3757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jonathan Delfour updated BEAM-3757:
-----------------------------------
    Description: 
Hi,

First issue is that the beam 2.3.0 python SDK is apparently not working on GCP. It gets stuck: 
{noformat}
Workflow failed. Causes: (bf832d44290fbf41): The Dataflow appears to be stuck. You can get
help with Cloud Dataflow at https://cloud.google.com/dataflow/support. 
{noformat}
I tried two times.

Reverting back to 2.2.0: it usually works but today, after > 1 hour of processing, and
30 workers used, I get a failure with these in the logs:

{noformat}
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py", line 582,
in do_work
    work_executor.execute()
  File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/executor.py", line 167, in
execute
    op.start()
  File "dataflow_worker/shuffle_operations.py", line 49, in dataflow_worker.shuffle_operations.GroupedShuffleReadOperation.start
    def start(self):
  File "dataflow_worker/shuffle_operations.py", line 50, in dataflow_worker.shuffle_operations.GroupedShuffleReadOperation.start
    with self.scoped_start_state:
  File "dataflow_worker/shuffle_operations.py", line 65, in dataflow_worker.shuffle_operations.GroupedShuffleReadOperation.start
    with self.shuffle_source.reader() as reader:
  File "dataflow_worker/shuffle_operations.py", line 67, in dataflow_worker.shuffle_operations.GroupedShuffleReadOperation.start
    for key_values in reader:
  File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/shuffle.py", line 406, in __iter__
    for entry in entries_iterator:
  File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/shuffle.py", line 248, in next
    return next(self.iterator)
  File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/shuffle.py", line 206, in __iter__
    chunk, next_position = self.reader.Read(start_position, end_position)
  File "third_party/windmill/shuffle/python/shuffle_client.pyx", line 138, in shuffle_client.PyShuffleReader.Read
IOError: Shuffle read failed: INTERNAL: Received RST_STREAM with error code 2  talking to
my-dataflow-02271107-756f-harness-2p65:12346
{noformat}

i also get some information message:

{noformat}
Refusing to split <dataflow_worker.shuffle.GroupedShuffleRangeTracker object at 0x7f03a00fe790>
at '\x00\x00\x00\t\x1d\x14\x87\xa3\x00\x01': unstarted
{noformat}

For the flow, I am extracting data from BQ, cleaning using pandas, exporting as a csv file,
gzipping and uploading the compressed file to a bucket using decompressive transcoding (csv
export, gzip compression and upload are in the same 'worker' as they are done in the same
beam.DoFn).

PS: i can't find a reasonable way to export the logs from GCP but i can privately send the
log file i have of the run on my machine (the log of the pipeline)


  was:
Hi,

First issue is that the beam 2.3.0 python SDK is apparently not working on GCP. It gets stuck: 
{noformat}
Workflow failed. Causes: (bf832d44290fbf41): The Dataflow appears to be stuck. You can get
help with Cloud Dataflow at https://cloud.google.com/dataflow/support. 
{noformat}
I tried two times.

Reverting back to 2.2.0: it usually works but today, after > 1 hour of processing, and
30 workers used, I get a failure with these in the logs:

{noformat}
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py", line 582,
in do_work
    work_executor.execute()
  File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/executor.py", line 167, in
execute
    op.start()
  File "dataflow_worker/shuffle_operations.py", line 49, in dataflow_worker.shuffle_operations.GroupedShuffleReadOperation.start
    def start(self):
  File "dataflow_worker/shuffle_operations.py", line 50, in dataflow_worker.shuffle_operations.GroupedShuffleReadOperation.start
    with self.scoped_start_state:
  File "dataflow_worker/shuffle_operations.py", line 65, in dataflow_worker.shuffle_operations.GroupedShuffleReadOperation.start
    with self.shuffle_source.reader() as reader:
  File "dataflow_worker/shuffle_operations.py", line 67, in dataflow_worker.shuffle_operations.GroupedShuffleReadOperation.start
    for key_values in reader:
  File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/shuffle.py", line 406, in __iter__
    for entry in entries_iterator:
  File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/shuffle.py", line 248, in next
    return next(self.iterator)
  File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/shuffle.py", line 206, in __iter__
    chunk, next_position = self.reader.Read(start_position, end_position)
  File "third_party/windmill/shuffle/python/shuffle_client.pyx", line 138, in shuffle_client.PyShuffleReader.Read
IOError: Shuffle read failed: INTERNAL: Received RST_STREAM with error code 2  talking to
my-dataflow-02271107-756f-harness-2p65:12346
{noformat}

i also get some information message:

{noformat}
Refusing to split <dataflow_worker.shuffle.GroupedShuffleRangeTracker object at 0x7f03a00fe790>
at '\x00\x00\x00\t\x1d\x14\x87\xa3\x00\x01': unstarted
{noformat}

For the flow, I am extracting data from BQ, cleaning using pandas, exporting as a csv file,
gzipping and uploading the compressed file to a bucket using decompressive transcoding (csv
export, gzip compression and upload are in the same 'worker' as they are done in the same
beam.DoFn).



> Shuffle read failed using python 2.2.0
> --------------------------------------
>
>                 Key: BEAM-3757
>                 URL: https://issues.apache.org/jira/browse/BEAM-3757
>             Project: Beam
>          Issue Type: Bug
>          Components: runner-dataflow
>    Affects Versions: 2.2.0
>         Environment: gcp, macos
>            Reporter: Jonathan Delfour
>            Assignee: Thomas Groh
>            Priority: Major
>
> Hi,
> First issue is that the beam 2.3.0 python SDK is apparently not working on GCP. It gets
stuck: 
> {noformat}
> Workflow failed. Causes: (bf832d44290fbf41): The Dataflow appears to be stuck. You can
get help with Cloud Dataflow at https://cloud.google.com/dataflow/support. 
> {noformat}
> I tried two times.
> Reverting back to 2.2.0: it usually works but today, after > 1 hour of processing,
and 30 workers used, I get a failure with these in the logs:
> {noformat}
> Traceback (most recent call last):
>   File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py", line
582, in do_work
>     work_executor.execute()
>   File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/executor.py", line 167,
in execute
>     op.start()
>   File "dataflow_worker/shuffle_operations.py", line 49, in dataflow_worker.shuffle_operations.GroupedShuffleReadOperation.start
>     def start(self):
>   File "dataflow_worker/shuffle_operations.py", line 50, in dataflow_worker.shuffle_operations.GroupedShuffleReadOperation.start
>     with self.scoped_start_state:
>   File "dataflow_worker/shuffle_operations.py", line 65, in dataflow_worker.shuffle_operations.GroupedShuffleReadOperation.start
>     with self.shuffle_source.reader() as reader:
>   File "dataflow_worker/shuffle_operations.py", line 67, in dataflow_worker.shuffle_operations.GroupedShuffleReadOperation.start
>     for key_values in reader:
>   File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/shuffle.py", line 406,
in __iter__
>     for entry in entries_iterator:
>   File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/shuffle.py", line 248,
in next
>     return next(self.iterator)
>   File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/shuffle.py", line 206,
in __iter__
>     chunk, next_position = self.reader.Read(start_position, end_position)
>   File "third_party/windmill/shuffle/python/shuffle_client.pyx", line 138, in shuffle_client.PyShuffleReader.Read
> IOError: Shuffle read failed: INTERNAL: Received RST_STREAM with error code 2  talking
to my-dataflow-02271107-756f-harness-2p65:12346
> {noformat}
> i also get some information message:
> {noformat}
> Refusing to split <dataflow_worker.shuffle.GroupedShuffleRangeTracker object at 0x7f03a00fe790>
at '\x00\x00\x00\t\x1d\x14\x87\xa3\x00\x01': unstarted
> {noformat}
> For the flow, I am extracting data from BQ, cleaning using pandas, exporting as a csv
file, gzipping and uploading the compressed file to a bucket using decompressive transcoding
(csv export, gzip compression and upload are in the same 'worker' as they are done in the
same beam.DoFn).
> PS: i can't find a reasonable way to export the logs from GCP but i can privately send
the log file i have of the run on my machine (the log of the pipeline)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message