hc-httpclient-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Osipov <micha...@apache.org>
Subject Re: Cannot saturate LAN connection with HttpClient
Date Sun, 24 May 2015 12:44:05 GMT
Am 2015-05-24 um 14:25 schrieb Oleg Kalnichevski:
> On Sun, 2015-05-24 at 13:02 +0200, Michael Osipov wrote:
>> Am 2015-05-24 um 12:17 schrieb Oleg Kalnichevski:
>>> On Sun, 2015-05-24 at 00:29 +0200, Michael Osipov wrote:
>>>> Am 2015-05-23 um 22:29 schrieb Oleg Kalnichevski:
>>>>> On Sat, 2015-05-23 at 22:09 +0200, Michael Osipov wrote:
>>>>>> Hi,
>>>>>>
>>>>>> we are experiencing a (slight) performance problem with HttpClient
4.4.1
>>>>>> while downloading big files from a remote server in the corporate
intranet.
>>>>>>
>>>>>> A simple test client:
>>>>>> HttpClientBuilder builder = HttpClientBuilder.create();
>>>>>> try (CloseableHttpClient client = builder.build()) {
>>>>>>       HttpGet get = new HttpGet("...");
>>>>>>       long start = System.nanoTime();
>>>>>>       HttpResponse response = client.execute(get);
>>>>>>       HttpEntity entity = response.getEntity();
>>>>>>
>>>>>>       File file = File.createTempFile("prefix", null);
>>>>>>       OutputStream os = new FileOutputStream(file);
>>>>>>       entity.writeTo(os);
>>>>>>       long stop = System.nanoTime();
>>>>>>       long contentLength = file.length();
>>>>>>
>>>>>>       long diff = stop - start;
>>>>>>       System.out.printf("Duration: %d ms%n",
>>>>>> TimeUnit.NANOSECONDS.toMillis(diff));
>>>>>>       System.out.printf("Size: %d%n", contentLength);
>>>>>>
>>>>>>       float speed = contentLength / (float) diff * (1_000_000_000
/ 1_000_000);
>>>>>>
>>>>>>       System.out.printf("Speed: %.2f MB/s%n", speed);
>>>>>> }
>>>>>>
>>>>>> After at least 10 repetions I see that the 182 MB file is download
>>>>>> within 24 000 ms with about 8 MB/s max. I cannot top that.
>>>>>>
>>>>>> I have tried this over and over again with curl and see that curl
is
>>>>>> able to saturate the entire LAN connection (100 Mbit/s).
>>>>>>
>>>>>> My tests are done on Windows 7 64 bit, JDK 7u67 32 bit.
>>>>>>
>>>>>> Any idea what the bottleneck might me?
>>>>
>>>> Thanks for the quick response.
>>>>
>>>>> (1) Curl should be using zero copy file transfer which Java blocking
i/o
>>>>> does not support. HttpAsyncClient on the other hand supports zero copy
>>>>> file transfer and generally tends to perform better when writing content
>>>>> out directly to the disk.
>>>>
>>>> I did try this [1] example and my heap exploaded. After increasing it to
>>>> -Xmx1024M, it did saturate the entire connection.
>>>>
>>>
>>> This sounds wrong. The example below does not use zero copy (with zero
>>> copy there should be no heap memory allocation at all).
>>>
>>> This example demonstrates how to use zero copy file transfer
>>>
>>> http://hc.apache.org/httpcomponents-asyncclient-4.1.x/httpasyncclient/examples/org/apache/http/examples/nio/client/ZeroCopyHttpExchange.java
>>
>> I have seen this example but there is no ZeroCopyGet. I haven't found
>> any example which explicitly says use zero-copy for GETs. The example
>> from [1] did work but with the explosion. What did I wrong here.
>>
>
> Zero copy can be employed only if a message encloses an entity in it.
> Therefore there is no such thing as ZeroCopyGet in HC. One can execute a
> normal GET request and use a ZeroCopyConsumer to stream content out
> directly to a file without any intermediate buffering in memory.

OK, that has confirmed my assumptions.

>>>>> (2) Use larger socket / intermediate buffers. Default buffer size used
>>>>> by Entity implementations is most likely suboptimal.
>>>>
>>>> That did not make any difference. I have changed:
>>>>
>>>> 1. Socket receive size
>>>> 2. Employed a buffered input stream
>>>> 3. Manually copied the stream to a file
>>>>
>>>> I have varied the buffer size from 2^14 to 2^20 bytes. No avail.
>>>> Regardless of this, your tip with zero copy helped me a lot.
>>>>
>>>> Unfortunately, this is just a little piece in a performance degregation
>>>> chain a colleague has figured out. HttpClient acts as an intermediate in
>>>> a webapp which receives a request via REST from a client, processes that
>>>> and opens up the stream to the huge files from a remote server. Without
>>>> caching the files to disk, I am passing the Entity#getContent stream
>>>> back to the client. The degreation is about 75 %.
>>>>
>>>> After rethinking your tips, I just checked the servers I am pulling off
>>>> data. One is slow the otherone is fast. Transfer speeds with piping the
>>>> streams from the fast server remains at 8 MB/s which is what I wanted
>>>> after I have identified an issue with my custom HttpResponseInputStream.
>>>>
>>>> I modified my code to use the async client and it seems to pipe with
>>>> maximum LAN speed though it looks weird with curl now. Curl blocks for
>>>> 15 seconds and within a second the entire stream is written down to disk.
>>>>
>>>
>>> It all sounds very bizarre. I see no reason why HttpAsyncClient without
>>> zero copy transfer should do any better than HttpClient in this
>>> scenario.
>>
>> So you are saying something is probably wrong with my client setup?
>>
>
> I think it is not unlikely.

Assuming I'd have the time to investigate that, I have currently no idea 
where to start looking for the mismatch.


---------------------------------------------------------------------
To unsubscribe, e-mail: httpclient-users-unsubscribe@hc.apache.org
For additional commands, e-mail: httpclient-users-help@hc.apache.org


Mime
View raw message