libcloud-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <...@apache.org>
Subject [GitHub] [libcloud] pquentin opened a new pull request #1353: Reuse TCP connections when uploading files
Date Fri, 04 Oct 2019 11:50:24 GMT
pquentin opened a new pull request #1353: Reuse TCP connections when uploading files
URL: https://github.com/apache/libcloud/pull/1353
 
 
   ## Reuse TCP connections when uploading files)
   
   ### Description
   
   It's easy to break connection reuse when using the requests API: just use `stream=True`
and never read the response. The connection used to make the request will never be reused,
and will be dropped when the urllib3's connection pool is full.
   
   It turns out uploading objects using the S3 API goes through `prepared_request`, which
incorrectly sets `stream` to the value of `raw`, `True` in our case. And since we don't read
the response data, the connection are never reused, and each upload requires its own connection.
   
   This is particularly wasteful when uploading many small objects, which can easily happen
with JSON or Parquet files generated by Apache Spark, where setting up the connection takes
significant time compared to uploading a few bytes.
   
   Setting `stream=stream` in the `prepared_request` method matches the code in the `request`
method and fixes the bug.
   
   ### Status
   
   - work in progress
   
   ### Checklist (tick everything that applies)
   
   - [x] [Code linting](http://libcloud.readthedocs.org/en/latest/development.html#code-style-guide)
(required, can be done after the PR checks)
   - [x] Documentation
   - [x] [Tests](http://libcloud.readthedocs.org/en/latest/testing.html)
   - [x] [ICLA](http://libcloud.readthedocs.org/en/latest/development.html#contributing-bigger-changes)
(required for bigger changes)
   
   cc @Kami @tonybaloney 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

Mime
View raw message