From notifications-return-15264-archive-asf-public=cust-asf.ponee.io@libcloud.apache.org Sun May 26 19:27:08 2019 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id 7EF6D18062B for ; Sun, 26 May 2019 21:27:08 +0200 (CEST) Received: (qmail 38381 invoked by uid 500); 26 May 2019 19:27:07 -0000 Mailing-List: contact notifications-help@libcloud.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@libcloud.apache.org Delivered-To: mailing list notifications@libcloud.apache.org Received: (qmail 38371 invoked by uid 99); 26 May 2019 19:27:07 -0000 Received: from ec2-52-202-80-70.compute-1.amazonaws.com (HELO gitbox.apache.org) (52.202.80.70) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 26 May 2019 19:27:07 +0000 From: GitBox To: notifications@libcloud.apache.org Subject: [GitHub] [libcloud] c-w commented on a change in pull request #1287: [LIBCLOUD-1043] Fix Azure upload_object_via_stream used with iter Message-ID: <155889882746.11069.3869366733680166719.gitbox@gitbox.apache.org> Date: Sun, 26 May 2019 19:27:07 -0000 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit c-w commented on a change in pull request #1287: [LIBCLOUD-1043] Fix Azure upload_object_via_stream used with iter URL: https://github.com/apache/libcloud/pull/1287#discussion_r287611116 ########## File path: libcloud/storage/drivers/azure_blobs.py ########## @@ -825,7 +826,12 @@ def upload_object_via_stream(self, iterator, container, object_name, """ self._check_values(ex_blob_type, ex_page_blob_size) if ex_blob_type == "BlockBlob": - iterator.seek(0, os.SEEK_END) + try: + iterator.seek(0, os.SEEK_END) + except AttributeError: + buffer = BytesIO() + buffer.writelines(iterator) Review comment: Yes, this does buffer the whole iterator in memory unfortunately. Given that [the content-size header is required](https://docs.microsoft.com/en-us/rest/api/storageservices/put-blob#request-headers-all-blob-types) we have to find the size of the iterator before making the request so I don't see a way to avoid this for the general case. E.g. using [tee](https://docs.python.org/3/library/itertools.html#itertools.tee) followed by something like [ilen](https://more-itertools.readthedocs.io/en/stable/_modules/more_itertools/more.html#ilen) will still copy the iterator in memory. One potential work-around could be to upload chunks from the iterator via individual [put block](https://docs.microsoft.com/en-us/rest/api/storageservices/put-block) requests followed with a [put block list](https://docs.microsoft.com/en-us/rest/api/storageservices/put-block-list) request, but that would be a much more invasive change to the codebase. (Note that I also haven't proved out this approach in code yet so for now it's just a hypothesis from reading the docs.) Do you have any suggestions to improve this? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org With regards, Apache Git Services