From issues-return-139064-archive-asf-public=cust-asf.ponee.io@maven.apache.org  Mon Oct 15 12:17:06 2018
Return-Path: <issues-return-139064-archive-asf-public=cust-asf.ponee.io@maven.apache.org>
X-Original-To: archive-asf-public@cust-asf.ponee.io
Delivered-To: archive-asf-public@cust-asf.ponee.io
Received: from mail.apache.org (hermes.apache.org [140.211.11.3])
	by mx-eu-01.ponee.io (Postfix) with SMTP id 5AD2F180647
	for <archive-asf-public@cust-asf.ponee.io>; Mon, 15 Oct 2018 12:17:05 +0200 (CEST)
Received: (qmail 4325 invoked by uid 500); 15 Oct 2018 10:17:04 -0000
Mailing-List: contact issues-help@maven.apache.org; run by ezmlm
Precedence: bulk
List-Help: <mailto:issues-help@maven.apache.org>
List-Unsubscribe: <mailto:issues-unsubscribe@maven.apache.org>
List-Post: <mailto:issues@maven.apache.org>
List-Id: <issues.maven.apache.org>
Reply-To: dev@maven.apache.org
Delivered-To: mailing list issues@maven.apache.org
Received: (qmail 4314 invoked by uid 99); 15 Oct 2018 10:17:04 -0000
Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142)
    by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 15 Oct 2018 10:17:04 +0000
Received: from localhost (localhost [127.0.0.1])
	by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 02CF91811B7
	for <issues@maven.apache.org>; Mon, 15 Oct 2018 10:17:04 +0000 (UTC)
X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org
X-Spam-Flag: NO
X-Spam-Score: -109.501
X-Spam-Level:
X-Spam-Status: No, score=-109.501 tagged_above=-999 required=6.31
	tests=[ENV_AND_HDR_SPF_MATCH=-0.5, KAM_ASCII_DIVIDERS=0.8,
	RCVD_IN_DNSWL_MED=-2.3, SPF_PASS=-0.001, USER_IN_DEF_SPF_WL=-7.5,
	USER_IN_WHITELIST=-100] autolearn=disabled
Received: from mx1-lw-us.apache.org ([10.40.0.8])
	by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024)
	with ESMTP id kOPS-EKuAfha for <issues@maven.apache.org>;
	Mon, 15 Oct 2018 10:17:02 +0000 (UTC)
Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139])
	by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id E6DCD5F43B
	for <issues@maven.apache.org>; Mon, 15 Oct 2018 10:17:01 +0000 (UTC)
Received: from jira-lw-us.apache.org (unknown [207.244.88.139])
	by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 7E02FE1328
	for <issues@maven.apache.org>; Mon, 15 Oct 2018 10:17:00 +0000 (UTC)
Received: from jira-lw-us.apache.org (localhost [127.0.0.1])
	by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 45FC324DDC
	for <issues@maven.apache.org>; Mon, 15 Oct 2018 10:17:00 +0000 (UTC)
Date: Mon, 15 Oct 2018 10:17:00 +0000 (UTC)
From: "ASF GitHub Bot (JIRA)" <jira@apache.org>
To: issues@maven.apache.org
Message-ID: <JIRA.13191533.1539595151000.113519.1539598620285@Atlassian.JIRA>
In-Reply-To: <JIRA.13191533.1539595151000@Atlassian.JIRA>
References: <JIRA.13191533.1539595151000@Atlassian.JIRA> <JIRA.13191533.1539595151132@jira-lw-us.apache.org>
Subject: [jira] [Commented] (WAGON-537) Maven download speed of large
 artifacts is slow due to unsuitable buffer strategy for remote Artifacts in
 AbstractWagon
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394


    [ https://issues.apache.org/jira/browse/WAGON-537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16650019#comment-16650019 ] 

ASF GitHub Bot commented on WAGON-537:
--------------------------------------

olaf-otto commented on a change in pull request #50: WAGON-537 Maven download speed of large artifacts is slow
URL: https://github.com/apache/maven-wagon/pull/50#discussion_r225110395
 
 
 ##########
 File path: wagon-provider-api/src/main/java/org/apache/maven/wagon/AbstractWagon.java
 ##########
 @@ -560,31 +564,78 @@ protected void transfer( Resource resource, InputStream input, OutputStream outp
     protected void transfer( Resource resource, InputStream input, OutputStream output, int requestType, long maxSize )
         throws IOException
     {
-        byte[] buffer = new byte[DEFAULT_BUFFER_SIZE];
+        byte[] buffer = bufferForTransferring( resource );
 
         TransferEvent transferEvent = new TransferEvent( this, resource, TransferEvent.TRANSFER_PROGRESS, requestType );
         transferEvent.setTimestamp( System.currentTimeMillis() );
 
         long remaining = maxSize;
         while ( remaining > 0 )
         {
-            // let's safely cast to int because the min value will be lower than the buffer size.
-            int n = input.read( buffer, 0, (int) Math.min( buffer.length, remaining ) );
+            // Read from the stream, block if necessary until either EOF or buffer is filled.
+            // Filling the buffer has priority since downstream processors will significantly degrade i/o
+            // performance if called to frequently (large data streams) as they perform expensive tasks such as
+            // console output or data integrity checks.
+            int nextByte = input.read();
 
-            if ( n == -1 )
+            if ( nextByte == -1 )
             {
                 break;
             }
 
-            fireTransferProgress( transferEvent, buffer, n );
+            buffer[0] = ( byte ) nextByte;
+
+            // let's safely cast to int because the min value will be lower than the buffer size.
+            int length = (int) min( buffer.length, remaining ),
+                read = 1;
+
+            for ( ; read < length ; ++read )
+            {
+                nextByte = input.read();
+                if ( nextByte == -1 )
+                {
+                    break;
+                }
+                buffer[read] = ( byte ) nextByte;
+            }
+
+            fireTransferProgress( transferEvent, buffer, read );
 
-            output.write( buffer, 0, n );
+            output.write( buffer, 0, read );
 
-            remaining -= n;
+            remaining -= read;
         }
         output.flush();
     }
 
+    /**
+     * Provide a buffer suitably sized for efficiently
+     * {@link #transfer(Resource, InputStream, OutputStream, int, long) transferring}
+     * the given {@link Resource}. For larger files, larger buffers are provided such that downstream
+     * {@link #fireTransferProgress(TransferEvent, byte[], int) listeners} are not notified overly frequently.
+     * For instance, transferring gigabyte-sized resources would result in millions of notifications when using
+     * only a few kilobytes of buffer, drastically slowing transfer since transfer progress listeners and
+     * notifications are synchronous and may block, e.g. when writing download progress status to console.
+     *
+     * @param resource must not be null.
+     * @return a byte buffer suitable for the {@link Resource#getContentLength() content length} of the resource.
+     */
+    protected byte[] bufferForTransferring( Resource resource )
+    {
+        long contentLength = resource.getContentLength();
+
+        if ( contentLength <= 0 )
+        {
+            return new byte[DEFAULT_BUFFER_SIZE];
+        }
+
+        int numberOf4KbSegmentsFor100Chunks = ( ( int ) ( contentLength / ( 4 * 1024 * 100 ) ) );
 
 Review comment:
   Kindly asking to specifically review the way the buffer is sized here. 
   
   The Idea is that at least the old buffer size (DEFAULT_BUFFER_SIZE) is used and that buffers should have a size of N * 4Kb. As a rule of thumb, the buffer should be sized such that at least 100 chunks of data are being processed. I've capped the buffer at MAXIMUM_BUFFER_SIZE (512 Kb).
   
   Resulting, most of the files transferred (Metadata such as POMs and checksums, small JARs...) are transferred with the old buffer size. However, larger files are transferred with up to 512 Kb of buffer.
   
   For a 10GB stream, this will reduce the number of chunks from > 5 million (the previous strategy almost never used the entire 4k buffer but less than half) to about 20 thousand chunks.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> Maven download speed of large artifacts is slow due to unsuitable buffer strategy for remote Artifacts in AbstractWagon
> -----------------------------------------------------------------------------------------------------------------------
>
>                 Key: WAGON-537
>                 URL: https://issues.apache.org/jira/browse/WAGON-537
>             Project: Maven Wagon
>          Issue Type: Improvement
>          Components: wagon-provider-api
>    Affects Versions: 3.2.0
>         Environment: Windows 10, JDK 1.8, Nexus  Artifact store > 100MB/s network connection.
>            Reporter: Olaf Otto
>            Priority: Major
>              Labels: perfomance
>         Attachments: wagon-issue.png
>
>
> We are using maven for build process automation with docker. This sometimes involves downloading images with a few gigabytes in size. Here, maven's download speed is consistently and reproducibly slow. For instance, an artifact with 7,5 GB in size took almost two hours to transfer in spite of a 100 MB/s connection with respective reproducible download speed from the remote nexus artifact repository when using a browser to download.
> I have investigated the issue using JProfiler. The result clearly shows a significant issue in AbstractWagon's transfer( Resource resource, InputStream input, OutputStream output, int requestType, long maxSize ) method used for remote artifacts.
> Here, the input stream is read in a loop using a 4 Kb buffer. Whenever data is received, the received data is pushed to downstream listeners via fireTransferProgress. These listeners (or rather consumers) perform  expensive tasks such as checksumming or printing to console.
> Now, the underlying InputStream implementation used in transfer will return calls to read(bugger, offset, length) as soon as *some* data is available. That is, fireTransferProgress is invoked with an average number of bytes less than half the buffer capacity (this varies with the underlying network and hardware architecture). Consequently, fireTransferProgress is invoked *millions of times* for large files. As this is a blocking operation, the time spent in fireTransferProgress dominates and drastically slows down the transfer by at least one order of magnitude. 
> !wagon-issue.png! 
> In our case, we found download speed reduced from a theoretical optimum of ~80 seconds to to more than 3200 seconds.
> From an architectural perspective, I would not want to make the consumers / listeners invoked via fireTransferProgress aware of their potential impact on download speed, but rather refactor the transfer method such that it uses a buffer strategy reducing the the number of fireTransferProgress invocations. This should be done with regard to the expected file size of the transfer, such that fireTransferProgress is invoked often enough but not to frequent.
> I have implemented a solution and transfer speed went up more than one order of magnitude. I will provide a pull request asap.


--
This message was sent by Atlassian JIRA
(v7.6.3#76005)