cloudstack-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Burwell <>
Subject Re: [MERGE]object_store branch into master
Date Thu, 23 May 2013 19:54:44 GMT

TL;DR There are rare circumstances where directly access the AWS HTTP API makes sense, but
generally, you should use a client.

If you need the size of an S3 object in advance, I recommend using AmazonS3Client#getObjectMeta
method.  Also, large swathes of the class appear to be copied directly from HTTPTemplateDownloader
and don't make sense when interacting S3.  For example, S3 does not support chunked HTTP operations
or HTTP authentication.  As such, this class could be greatly simplified to operate in terms
of S3Utils (likely with a few additional methods) to get this information.  

In terms of progress tracking of downloads and transfer operations, I recommend retrofitting
S3Utils to leverage the TransferManager mechanism.  In addition to providing progress information
(TransferProgress), it also provides various operation optimizations (particularly for uploads).
 It also simplifies the interaction tremendously (e.g. it builds the PutRequest objects for


On May 23, 2013, at 12:12 PM, Min Chen <> wrote:

> John, sorry for my ignorance. In, we are
> only using HttpClient method to get the size and InputStream from a given
> URL, then we are invoking S3 client library putObject(PutObjectRequest) to
> download the content to S3. By using PutObjectRequest, I can set
> ProgressListener to show download progress to end user. What is your
> recommendation in this case?
> Thanks
> -min
> On 5/23/13 7:20 AM, "John Burwell" <> wrote:
>> Min,
>> The is directly accessing
>> the S3 API using HTTP client.
>> Thanks,
>> John
>> On May 22, 2013, at 5:57 PM, Min Chen <> wrote:
>>> John,
>>> 	Can you clarify a bit on your last comment about directly accessing S3
>>> HTTP API? We are only invoking routines in S3Utils to perform operations
>>> with S3, not invoke any REST api if that is what you meant.
>>> 	Thanks
>>> 	-min
>>> On 5/22/13 2:49 PM, "John Burwell" <> wrote:
>>>> Edison,
>>>> For changes that take as long as described, it should be expected that
>>>> the review will take a proportional amount of time.  In future
>>>> releases, we should think through ways to divide changes such as these
>>>> into a set of smaller patches submitted throughout the course of the
>>>> release cycle.
>>>> So far, I can say I am very concerned about failure scenarios and
>>>> potential race conditions around the NFS cache. However, I am only a
>>>> quarter of the way through the code so my concerns may be resolved by
>>>> the end of the process.
>>>> I am also concerned about the correctness S3 implementation.  Why did
>>>> you choose to directly access the S3 HTTP API rather using the client
>>>> library?
>>>> Thanks,
>>>> -John
>>>> On May 22, 2013, at 5:25 PM, Edison Su <> wrote:
>>>>>> -----Original Message-----
>>>>>> From: Chip Childers []
>>>>>> Sent: Wednesday, May 22, 2013 1:26 PM
>>>>>> To:
>>>>>> Subject: Re: [MERGE]object_store branch into master
>>>>>> On Wed, May 22, 2013 at 08:15:41PM +0000, Animesh Chaturvedi wrote:
>>>>>>>> -----Original Message-----
>>>>>>>> From: Chip Childers []
>>>>>>>> Sent: Wednesday, May 22, 2013 12:08 PM
>>>>>>>> To:
>>>>>>>> Subject: Re: [MERGE]object_store branch into master
>>>>>>>> On Wed, May 22, 2013 at 07:00:51PM +0000, Animesh Chaturvedi
>>>>>>>>>> -----Original Message-----
>>>>>>>>>> From: John Burwell []
>>>>>>>>>> Sent: Tuesday, May 21, 2013 8:51 AM
>>>>>>>>>> To:
>>>>>>>>>> Subject: Re: [MERGE]object_store branch into master
>>>>>>>>>> Edison,
>>>>>>>>>> Thanks, I will start going through it today.  Based
on other
>>>>>>>>>> $dayjob responsibilities, it may take me a couple
of days.
>>>>>>>>>> Thanks,
>>>>>>>>>> -John
>>>>>>>>> [Animesh>] John we are just a few days away  from
4.2 feature
>>>>>>>>> freeze, can
>>>>>>>> you provide your comments by Friday 5/24.   I would like
>>>>>>>> feature
>>>>>> threads
>>>>>>>> to be resolved sooner so that we don't have last minute rush.
>>>>>>>> I'm just going to comment on this, but not take it much further...
>>>>>>>> this type of change is an "architectural" change.  We had
>>>>>>>> previously
>>>>>>>> discussed (on several
>>>>>>>> threads) that the appropriate time for this sort of thing
to hit
>>>>>>>> master was
>>>>>>>> *early* in the release cycle.  Any reason that that consensus
>>>>>>>> doesn't apply here?
>>>>>>> [Animesh>] Yes it is an architectural change and discussion
on this
>>>>>>> started a
>>>>>> few weeks back already, Min and Edison wanted to get it in sooner
>>>>>> 4/30
>>>>>> but it took longer than anticipated in  preparing for merge and
>>>>>> testing on
>>>>>> feature branch.
>>>>>> You're not following me I think.  See this thread on the Javelin
>>>>>> merge:
>>>>>> We have discussed that our preference is for architectural changes
>>>>>> hit
>>>>>> master shortly after a feature branch is cut.  Why are we not doing
>>>>>> that here?
>>>>> This kind of refactor takes time, a lot of time. I think I worked on
>>>>> the merge of primary storage refactor into master and bug fixes during
>>>>> March(
>>>>> 9)
>>>>> , then started to work on the secondary storage refactor in
>>>>> April( Min and I
>>>>> finished
>>>>> the coding at end of April, then tested for two weeks, send out the
>>>>> merge request at middle of May.
>>>>> With the refactor, the  storage code will be much cleaner, and the
>>>>> performance of S3 will be improved, and integration with other storage
>>>>> vendor will be much easier, and the quality is ok(33 bugs fired, only
>>>>> 5
>>>>> left: 
>>>>> ef
>>>>> actor%22). Anyway, it's up to the community to decide, merge it or
>>>>> not,
>>>>> we already tried our best to get it done ASAP.

View raw message