maven-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael Osipov (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (WAGON-537) Maven transfer speed of large artifacts is slow due to unsuitable buffer strategy
Date Sat, 10 Nov 2018 19:55:00 GMT

     [ https://issues.apache.org/jira/browse/WAGON-537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Michael Osipov updated WAGON-537:
---------------------------------
    Fix Version/s: 3.3.0

> Maven transfer speed of large artifacts is slow due to unsuitable buffer strategy
> ---------------------------------------------------------------------------------
>
>                 Key: WAGON-537
>                 URL: https://issues.apache.org/jira/browse/WAGON-537
>             Project: Maven Wagon
>          Issue Type: Improvement
>          Components: wagon-http, wagon-provider-api
>    Affects Versions: 3.2.0
>         Environment: Windows 10, JDK 1.8, Nexus  Artifact store > 100MB/s network
connection.
>            Reporter: Olaf Otto
>            Assignee: Michael Osipov
>            Priority: Major
>              Labels: perfomance
>             Fix For: 3.3.0
>
>         Attachments: wagon-issue.png
>
>
> We are using maven for build process automation with docker. This sometimes involves
uploading and downloading artifacts with a few gigabytes in size. Here, maven's transfer speed
is consistently and reproducibly slow. For instance, an artifact with 7,5 GB in size took
almost two hours to transfer in spite of a 100 MB/s connection with respective reproducible
download speed from the remote nexus artifact repository when using a browser to download.
The same is true when uploding such an artifact.
> I have investigated the issue using JProfiler. The result shows an issue in AbstractWagon's
transfer( Resource resource, InputStream input, OutputStream output, int requestType, long
maxSize ) method used for remote artifacts and the same issue in AbstractHttpClientWagon#writeTo(OutputStream).
> Here, the input stream is read in a loop using a 4 Kb buffer. Whenever data is received,
the received data is pushed to downstream listeners via fireTransferProgress. These listeners
(or rather consumers) perform expensive tasks.
> Now, the underlying InputStream implementation used in transfer will return calls to
read(buffer, offset, length) as soon as *some* data is available. That is, fireTransferProgress
may well be invoked with an average number of bytes less than half the buffer capacity (this
varies with the underlying network and hardware architecture). Consequently, fireTransferProgress
is invoked *millions of times* for large files. As this is a blocking operation, the time
spent in fireTransferProgress dominates and drastically slows down the transfers by at least
one order of magnitude. 
> !wagon-issue.png! 
> In our case, we found download speed reduced from a theoretical optimum of ~80 seconds
to to more than 3200 seconds.
> From an architectural perspective, I would not want to make the consumers / listeners
invoked via fireTransferProgress aware of their potential impact on download speed, but rather
refactor the transfer method such that it uses a buffer strategy reducing the the number of
fireTransferProgress invocations. This should be done with regard to the expected file size
of the transfer, such that fireTransferProgress is invoked often enough but not to frequent.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message