beam-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Beam JIRA Bot (Jira)" <j...@apache.org>
Subject [jira] [Commented] (BEAM-8216) GCS IO fails with uninformative 'Broken pipe' errors while attempting to write to a GCS bucket without proper permissions.
Date Mon, 10 Aug 2020 17:08:12 GMT

    [ https://issues.apache.org/jira/browse/BEAM-8216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17174649#comment-17174649
] 

Beam JIRA Bot commented on BEAM-8216:
-------------------------------------

This issue is P2 but has been unassigned without any comment for 60 days so it has been labeled
"stale-P2". If this issue is still affecting you, we care! Please comment and remove the label.
Otherwise, in 14 days the issue will be moved to P3.

Please see https://beam.apache.org/contribute/jira-priorities/ for a detailed explanation
of what these priorities mean.


> GCS IO fails with uninformative 'Broken pipe' errors while attempting to write to a GCS
bucket without proper permissions.
> --------------------------------------------------------------------------------------------------------------------------
>
>                 Key: BEAM-8216
>                 URL: https://issues.apache.org/jira/browse/BEAM-8216
>             Project: Beam
>          Issue Type: Bug
>          Components: io-py-gcp
>            Reporter: Valentyn Tymofieiev
>            Priority: P2
>              Labels: stale-P2, starter
>
> Obvserved while executing a wordcount IT pipeline:
> {noformat}
>  ./gradlew :sdks:python:test-suites:dataflow:py36:integrationTest \
> -Dtests=apache_beam.examples.wordcount_it_test:WordCountIT.test_wordcount_it \
> -Dattr=IT -DpipelineOptions="--project=some_project_different_from_apache_beam_testing
\
> --staging_location=gs://some_bucket/ \
> --temp_location=gs://some_bucket/ \
> --input=gs://apache-beam-samples/input_small_files/ascii_sort_1MB_input.0000* \
> --output=gs://temp-storage-for-end-to-end-tests/py-it-cloud/output  \
> --expect_checksum=ea0ca2e5ee4ea5f218790f28d0b9fe7d09d8d710 \
> --num_workers=10 \
> --autoscaling_algorithm=NONE \
> --runner=TestDataflowRunner \
> --sdk_location=/full/path/to/beam/sdks/python/dist/apache-beam-2.16.0.dev0.tar.gz" \
> --info  
> {noformat}
> gs://temp-storage-for-end-to-end-tests/py-it-cloud/output lives in a different project
than was running the pipeline.
> This caused a bunch of Broken pipe errors. Console logs:
> {noformat}
> root: INFO: 2019-09-11T19:06:23.055Z: JOB_MESSAGE_BASIC: Finished operation read/Read+split+pair_with_one+group/Reify+group/Write
> root: INFO: 2019-09-11T19:06:23.157Z: JOB_MESSAGE_BASIC: Executing operation group/Close
> root: INFO: 2019-09-11T19:06:23.208Z: JOB_MESSAGE_BASIC: Finished operation group/Close
> root: INFO: 2019-09-11T19:06:23.263Z: JOB_MESSAGE_BASIC: Executing operation group/Read+group/GroupByWindow+count+format+write/Write/WriteImpl/WriteBundles/WriteBundles+write/Write/WriteImpl/Pair+write/Write/WriteImpl/WindowInto(WindowIntoFn)+write/Write/WriteImpl/GroupByKey/Reify+write/Write/WriteImpl/GroupByKey/Write
> root: INFO: 2019-09-11T19:06:25.571Z: JOB_MESSAGE_ERROR: Traceback (most recent call
last):
>   File "apache_beam/runners/common.py", line 782, in apache_beam.runners.common.DoFnRunner.process
>   File "apache_beam/runners/common.py", line 594, in apache_beam.runners.common.PerWindowInvoker.invoke_process
>   File "apache_beam/runners/common.py", line 666, in apache_beam.runners.common.PerWindowInvoker._invoke_process_per_window
>   File "/usr/local/lib/python3.6/site-packages/apache_beam/io/iobase.py", line 1042,
in process
>     self.writer.write(element)
>   File "/usr/local/lib/python3.6/site-packages/apache_beam/io/filebasedsink.py", line
393, in write
>     self.sink.write_record(self.temp_handle, value)
>   File "/usr/local/lib/python3.6/site-packages/apache_beam/io/filebasedsink.py", line
137, in write_record
>     self.write_encoded_record(file_handle, self.coder.encode(value))
>   File "/usr/local/lib/python3.6/site-packages/apache_beam/io/textio.py", line 407, in
write_encoded_record
>     file_handle.write(encoded_value)
>   File "/usr/local/lib/python3.6/site-packages/apache_beam/io/filesystemio.py", line
202, in write
>     self._uploader.put(b)
>   File "/usr/local/lib/python3.6/site-packages/apache_beam/io/gcp/gcsio.py", line 594,
in put
>     self._conn.send_bytes(data.tobytes())
>   File "/usr/local/lib/python3.6/multiprocessing/connection.py", line 200, in send_bytes
>     self._send_bytes(m[offset:offset + size])
>   File "/usr/local/lib/python3.6/multiprocessing/connection.py", line 397, in _send_bytes
>     self._send(header)
>   File "/usr/local/lib/python3.6/multiprocessing/connection.py", line 368, in _send
>     n = write(self._handle, buf)
> BrokenPipeError: [Errno 32] Broken pipe
> During handling of the above exception, another exception occurred:
> Traceback (most recent call last):
>   File "/usr/local/lib/python3.6/site-packages/dataflow_worker/batchworker.py", line
649, in do_work
> ...
> root: INFO: 2019-09-11T19:06:33.027Z: JOB_MESSAGE_DEBUG: Executing failure step failure25
> root: INFO: 2019-09-11T19:06:33.066Z: JOB_MESSAGE_ERROR: Workflow failed. Causes: S08:group/Read+group/GroupByWindow+count+format+write/Write/WriteImpl/WriteBundles/WriteBundles+write/Write/WriteImpl/Pair+write/Write/WriteImpl/WindowInto(WindowIntoFn)+write/Write/WriteImpl/GroupByKey/Reify+write/Write/WriteImpl/GroupByKey/Write
failed., The job failed because a work item has failed 4 times. Look in previous log entries
for the cause of each one of the 4 failures. For more information, see https://cloud.google.com/dataflow/docs/guides/common-errors.
The work item was attempted on these workers:
>   beamapp-valentyn-09111855-09111155-pj3z-harness-5g6h
>       Root cause: Work item failed.,
>   beamapp-valentyn-09111855-09111155-pj3z-harness-6ccc
>       Root cause: Work item failed.,
>   beamapp-valentyn-09111855-09111155-pj3z-harness-45pp
>       Root cause: Work item failed.,
>   beamapp-valentyn-09111855-09111155-pj3z-ha
> {noformat}
> Errors were gone after I changed the bucket to a bucket in the project where I ran the
pipeline.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Mime
View raw message