From user-return-1142-archive-asf-public=cust-asf.ponee.io@arrow.apache.org Mon Mar 29 15:00:34 2021 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mxout1-ec2-va.apache.org (mxout1-ec2-va.apache.org [3.227.148.255]) by mx-eu-01.ponee.io (Postfix) with ESMTPS id 9282B180670 for ; Mon, 29 Mar 2021 17:00:34 +0200 (CEST) Received: from mail.apache.org (mailroute1-lw-us.apache.org [207.244.88.153]) by mxout1-ec2-va.apache.org (ASF Mail Server at mxout1-ec2-va.apache.org) with SMTP id A1A8E3EFB8 for ; Mon, 29 Mar 2021 15:00:33 +0000 (UTC) Received: (qmail 25507 invoked by uid 500); 29 Mar 2021 15:00:33 -0000 Mailing-List: contact user-help@arrow.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@arrow.apache.org Delivered-To: mailing list user@arrow.apache.org Received: (qmail 25497 invoked by uid 99); 29 Mar 2021 15:00:33 -0000 Received: from spamproc1-he-fi.apache.org (HELO spamproc1-he-fi.apache.org) (95.217.134.168) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 29 Mar 2021 15:00:33 +0000 Received: from localhost (localhost [127.0.0.1]) by spamproc1-he-fi.apache.org (ASF Mail Server at spamproc1-he-fi.apache.org) with ESMTP id 6E07AC02D5 for ; Mon, 29 Mar 2021 15:00:32 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamproc1-he-fi.apache.org X-Spam-Flag: NO X-Spam-Score: 0.248 X-Spam-Level: X-Spam-Status: No, score=0.248 tagged_above=-999 required=6.31 tests=[HEADER_FROM_DIFFERENT_DOMAINS=0.249, SPF_PASS=-0.001] autolearn=disabled Received: from mx1-ec2-va.apache.org ([116.203.227.195]) by localhost (spamproc1-he-fi.apache.org [95.217.134.168]) (amavisd-new, port 10024) with ESMTP id F42HH52uB3X2 for ; Mon, 29 Mar 2021 15:00:31 +0000 (UTC) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=116.202.254.214; helo=ciao.gmane.io; envelope-from=gcaau-arrow-user@m.gmane-mx.org; receiver= Received: from ciao.gmane.io (ciao.gmane.io [116.202.254.214]) by mx1-ec2-va.apache.org (ASF Mail Server at mx1-ec2-va.apache.org) with ESMTPS id 5BDBDBD0DC for ; Mon, 29 Mar 2021 15:00:31 +0000 (UTC) Received: from list by ciao.gmane.io with local (Exim 4.92) (envelope-from ) id 1lQtNW-00062S-HF for user@arrow.apache.org; Mon, 29 Mar 2021 17:00:30 +0200 X-Injected-Via-Gmane: http://gmane.org/ To: user@arrow.apache.org From: Antoine Pitrou Subject: Re: timeouts in S3 reads in pyarrow Date: Mon, 29 Mar 2021 17:00:25 +0200 Message-ID: <20210329170026.7002ad24@fsol> References: <20210329155353.4dc0b245@fsol> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Newsreader: Claws Mail 3.17.5 (GTK+ 2.24.32; x86_64-pc-linux-gnu) On Mon, 29 Mar 2021 10:52:04 -0400 Luke wrote: > No firewall in this setup. And no errors when the keys are small that I > have seen. Also, using boto3 or requests there are no issues that I can > tell (although it may be suppressing timeouts and retries). How large is the file that gives an error, and how fast is your connection to the S3 server? Also, after how long do you get the timeout error? Regards Antoine. > > So you are saying the 'AWS Error [code 99]: curlCode: 28, Timeout was > reached' is coming from the server? Interesting, we haven't seen that > before but our access before this was with requests or s3fs which uses > boto3 under the hood and must handle this if so. > > thanks, will explore more, > Luke > > On Mon, Mar 29, 2021 at 9:54 AM Antoine Pitrou wrote: > > > > > Hi Luke, > > > > Given the error message, my intuition is that the timeout is on the > > server side. Arrow does not try to set any timeouts on S3 connections. > > > > Note that this message ("When reading information") happens *before* > > reading the file data, simply when trying to read the file length. So > > perhaps something is weird in your network configuration (is a firewall > > blocking packets?). > > > > Regards > > > > Antoine. > > > > > > > > On Sat, 27 Mar 2021 10:44:53 -0400 > > Luke wrote: > > > I have a local S3 compatible object store (using ceph) and am trying to > > use > > > the pyarrow fs interface. This seems to work well except on larger > > objects > > > I am getting unhandled exceptions. Is there a way to currently tune the > > > timeouts or retries? Here is the kind of code and error I am seeing: > > > > > > from pyarrow import fs > > > > > > > > > > > > s3 = > > > > > fs.S3FileSystem(access_key=my_ak,secret_key=my_sk,endpoint_override=my_endpoint,scheme='http') > > > > > > raw = s3.open_input_stream('test_bucket/example_key').readall() > > > > > > > > > > > > File "pyarrow/_fs.pyx", line 621, in > > > pyarrow._fs.FileSystem.open_input_stream > > > > > > File "pyarrow/error.pxi", line 122, in > > > pyarrow.lib.pyarrow_internal_check_status > > > > > > File "pyarrow/error.pxi", line 99, in pyarrow.lib.check_status > > > > > > OSError: When reading information for key 'example_key' in bucket > > > 'test_bucket': AWS Error [code 99]: curlCode: 28, Timeout was reached > > > > > > > > > > > > -- > > > > > > install details: > > > > > > python: python 3.8.6 > > > > > > OS: linux, redhat 7.7 > > > > > > pyarrow version: 3.0.0 > > > > > > > > > thanks for the help, > > > > > > Luke > > > > > > > > > > > >