arrow-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Luke <virtuall...@gmail.com>
Subject Re: timeouts in S3 reads in pyarrow
Date Mon, 29 Mar 2021 14:52:04 GMT
No firewall in this setup.  And no errors when the keys are small that I
have seen.  Also, using boto3 or requests there are no issues that I can
tell (although it may be suppressing timeouts and retries).

So you are saying the 'AWS Error [code 99]: curlCode: 28, Timeout was
reached' is coming from the server?  Interesting, we haven't seen that
before but our access before this was with requests or s3fs which uses
boto3 under the hood and must handle this if so.

thanks, will explore more,
Luke

On Mon, Mar 29, 2021 at 9:54 AM Antoine Pitrou <antoine@python.org> wrote:

>
> Hi Luke,
>
> Given the error message, my intuition is that the timeout is on the
> server side.  Arrow does not try to set any timeouts on S3 connections.
>
> Note that this message ("When reading information") happens *before*
> reading the file data, simply when trying to read the file length.  So
> perhaps something is weird in your network configuration (is a firewall
> blocking packets?).
>
> Regards
>
> Antoine.
>
>
>
> On Sat, 27 Mar 2021 10:44:53 -0400
> Luke <virtualluke@gmail.com> wrote:
> > I have a local S3 compatible object store (using ceph) and am trying to
> use
> > the pyarrow fs interface.  This seems to work well except on larger
> objects
> > I am getting unhandled exceptions.  Is there a way to currently tune the
> > timeouts or retries?  Here is the kind of code and error I am seeing:
> >
> > from pyarrow import fs
> >
> >
> >
> > s3 =
> >
> fs.S3FileSystem(access_key=my_ak,secret_key=my_sk,endpoint_override=my_endpoint,scheme='http')
> >
> > raw = s3.open_input_stream('test_bucket/example_key').readall()
> >
> >
> >
> > File "pyarrow/_fs.pyx", line 621, in
> > pyarrow._fs.FileSystem.open_input_stream
> >
> > File "pyarrow/error.pxi", line 122, in
> > pyarrow.lib.pyarrow_internal_check_status
> >
> > File "pyarrow/error.pxi", line 99, in pyarrow.lib.check_status
> >
> > OSError: When reading information for key 'example_key' in bucket
> > 'test_bucket': AWS Error [code 99]: curlCode: 28, Timeout was reached
> >
> >
> >
> > --
> >
> > install details:
> >
> > python: python 3.8.6
> >
> > OS: linux, redhat 7.7
> >
> > pyarrow version: 3.0.0
> >
> >
> > thanks for the help,
> >
> > Luke
> >
>
>
>
>

Mime
View raw message