From notifications-return-16167-archive-asf-public=cust-asf.ponee.io@libcloud.apache.org  Fri Oct  4 11:50:25 2019
Return-Path: <notifications-return-16167-archive-asf-public=cust-asf.ponee.io@libcloud.apache.org>
X-Original-To: archive-asf-public@cust-asf.ponee.io
Delivered-To: archive-asf-public@cust-asf.ponee.io
Received: from mail.apache.org (hermes.apache.org [207.244.88.153])
	by mx-eu-01.ponee.io (Postfix) with SMTP id 3D82D180651
	for <archive-asf-public@cust-asf.ponee.io>; Fri,  4 Oct 2019 13:50:25 +0200 (CEST)
Received: (qmail 20727 invoked by uid 500); 4 Oct 2019 11:50:24 -0000
Mailing-List: contact notifications-help@libcloud.apache.org; run by ezmlm
Precedence: bulk
List-Help: <mailto:notifications-help@libcloud.apache.org>
List-Unsubscribe: <mailto:notifications-unsubscribe@libcloud.apache.org>
List-Post: <mailto:notifications@libcloud.apache.org>
List-Id: <notifications.libcloud.apache.org>
Reply-To: dev@libcloud.apache.org
Delivered-To: mailing list notifications@libcloud.apache.org
Received: (qmail 20718 invoked by uid 99); 4 Oct 2019 11:50:24 -0000
Received: from ec2-52-202-80-70.compute-1.amazonaws.com (HELO gitbox.apache.org) (52.202.80.70)
    by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 04 Oct 2019 11:50:24 +0000
From: GitBox <git@apache.org>
To: notifications@libcloud.apache.org
Subject: [GitHub] [libcloud] pquentin opened a new pull request #1353: Reuse TCP
 connections when uploading files
Message-ID: <157018982455.8493.3125406669558884939.gitbox@gitbox.apache.org>
Date: Fri, 04 Oct 2019 11:50:24 -0000
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit

pquentin opened a new pull request #1353: Reuse TCP connections when uploading files
URL: https://github.com/apache/libcloud/pull/1353
 
 
   ## Reuse TCP connections when uploading files)
   
   ### Description
   
   It's easy to break connection reuse when using the requests API: just use `stream=True` and never read the response. The connection used to make the request will never be reused, and will be dropped when the urllib3's connection pool is full.
   
   It turns out uploading objects using the S3 API goes through `prepared_request`, which incorrectly sets `stream` to the value of `raw`, `True` in our case. And since we don't read the response data, the connection are never reused, and each upload requires its own connection.
   
   This is particularly wasteful when uploading many small objects, which can easily happen with JSON or Parquet files generated by Apache Spark, where setting up the connection takes significant time compared to uploading a few bytes.
   
   Setting `stream=stream` in the `prepared_request` method matches the code in the `request` method and fixes the bug.
   
   ### Status
   
   - work in progress
   
   ### Checklist (tick everything that applies)
   
   - [x] [Code linting](http://libcloud.readthedocs.org/en/latest/development.html#code-style-guide) (required, can be done after the PR checks)
   - [x] Documentation
   - [x] [Tests](http://libcloud.readthedocs.org/en/latest/testing.html)
   - [x] [ICLA](http://libcloud.readthedocs.org/en/latest/development.html#contributing-bigger-changes) (required for bigger changes)
   
   cc @Kami @tonybaloney 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services