arrow-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Antoine Pitrou <anto...@python.org>
Subject Re: timeouts in S3 reads in pyarrow
Date Mon, 29 Mar 2021 15:00:25 GMT
On Mon, 29 Mar 2021 10:52:04 -0400
Luke <virtualluke@gmail.com> wrote:
> No firewall in this setup.  And no errors when the keys are small that I
> have seen.  Also, using boto3 or requests there are no issues that I can
> tell (although it may be suppressing timeouts and retries).

How large is the file that gives an error, and how fast is your
connection to the S3 server?
Also, after how long do you get the timeout error?

Regards

Antoine.


> 
> So you are saying the 'AWS Error [code 99]: curlCode: 28, Timeout was
> reached' is coming from the server?  Interesting, we haven't seen that
> before but our access before this was with requests or s3fs which uses
> boto3 under the hood and must handle this if so.
> 
> thanks, will explore more,
> Luke
> 
> On Mon, Mar 29, 2021 at 9:54 AM Antoine Pitrou <antoine@python.org> wrote:
> 
> >
> > Hi Luke,
> >
> > Given the error message, my intuition is that the timeout is on the
> > server side.  Arrow does not try to set any timeouts on S3 connections.
> >
> > Note that this message ("When reading information") happens *before*
> > reading the file data, simply when trying to read the file length.  So
> > perhaps something is weird in your network configuration (is a firewall
> > blocking packets?).
> >
> > Regards
> >
> > Antoine.
> >
> >
> >
> > On Sat, 27 Mar 2021 10:44:53 -0400
> > Luke <virtualluke@gmail.com> wrote:  
> > > I have a local S3 compatible object store (using ceph) and am trying to  
> > use  
> > > the pyarrow fs interface.  This seems to work well except on larger  
> > objects  
> > > I am getting unhandled exceptions.  Is there a way to currently tune the
> > > timeouts or retries?  Here is the kind of code and error I am seeing:
> > >
> > > from pyarrow import fs
> > >
> > >
> > >
> > > s3 =
> > >  
> > fs.S3FileSystem(access_key=my_ak,secret_key=my_sk,endpoint_override=my_endpoint,scheme='http')
 
> > >
> > > raw = s3.open_input_stream('test_bucket/example_key').readall()
> > >
> > >
> > >
> > > File "pyarrow/_fs.pyx", line 621, in
> > > pyarrow._fs.FileSystem.open_input_stream
> > >
> > > File "pyarrow/error.pxi", line 122, in
> > > pyarrow.lib.pyarrow_internal_check_status
> > >
> > > File "pyarrow/error.pxi", line 99, in pyarrow.lib.check_status
> > >
> > > OSError: When reading information for key 'example_key' in bucket
> > > 'test_bucket': AWS Error [code 99]: curlCode: 28, Timeout was reached
> > >
> > >
> > >
> > > --
> > >
> > > install details:
> > >
> > > python: python 3.8.6
> > >
> > > OS: linux, redhat 7.7
> > >
> > > pyarrow version: 3.0.0
> > >
> > >
> > > thanks for the help,
> > >
> > > Luke
> > >  
> >
> >
> >
> >  
> 




Mime
View raw message