arrow-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Luke <virtuall...@gmail.com>
Subject timeouts in S3 reads in pyarrow
Date Sat, 27 Mar 2021 14:44:53 GMT
I have a local S3 compatible object store (using ceph) and am trying to use
the pyarrow fs interface.  This seems to work well except on larger objects
I am getting unhandled exceptions.  Is there a way to currently tune the
timeouts or retries?  Here is the kind of code and error I am seeing:

from pyarrow import fs



s3 =
fs.S3FileSystem(access_key=my_ak,secret_key=my_sk,endpoint_override=my_endpoint,scheme='http')

raw = s3.open_input_stream('test_bucket/example_key').readall()



File "pyarrow/_fs.pyx", line 621, in
pyarrow._fs.FileSystem.open_input_stream

File "pyarrow/error.pxi", line 122, in
pyarrow.lib.pyarrow_internal_check_status

File "pyarrow/error.pxi", line 99, in pyarrow.lib.check_status

OSError: When reading information for key 'example_key' in bucket
'test_bucket': AWS Error [code 99]: curlCode: 28, Timeout was reached



--

install details:

python: python 3.8.6

OS: linux, redhat 7.7

pyarrow version: 3.0.0


thanks for the help,

Luke

Mime
View raw message