commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thomas Neidhart (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (IO-468) Avoid allocating memory for method internal buffers, use threadlocal memory instead
Date Mon, 09 Feb 2015 20:58:35 GMT

    [ https://issues.apache.org/jira/browse/IO-468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14312870#comment-14312870
] 

Thomas Neidhart commented on IO-468:
------------------------------------

Some benchmarks I did with my own test harness. The numbers are the actual number of executions
of the test code within 100ms, averaged over a total of 10 runs.

Copying an ByteArrayInputStream:

{noformat}
Stream Length=100
=====================================================
Method				mean		stdev
-----------------------------------------------------
copy(i, o)			493703		34613
copyLarge(i, o, arr)		12205585	234018
copyLarge(i, o, tl.get())	10590205	206625
Diff tl/arr	0.87x
Diff tl/plain	21.45x
=====================================================

Stream Length=1000
=====================================================
Method				mean		stdev
-----------------------------------------------------
copy(i, o)			502246		11686
copyLarge(i, o, arr)		5553711	       159619
copyLarge(i, o, tl.get())	4880272	       232972
Diff tl/arr 	0.88x
Diff tl/plain 	9.72x
=====================================================

Stream Length=10000
=====================================================
Method				mean		stdev
-----------------------------------------------------
copy(i, o)			317060		4488
copyLarge(i, o, arr)		253169	       12052
copyLarge(i, o, tl.get())	522264	       12864
Diff tl/arr  	2.06x
Diff tl/plain	1.65x
=====================================================

Stream Length=100000
=====================================================
Method				mean		stdev
-----------------------------------------------------
copy(i, o)			47718		392
copyLarge(i, o, arr)		52298	        447
copyLarge(i, o, tl.get())	51703	        907
Diff tl/arr 	0.99x
Diff tl/plain	1.08x
=====================================================

Stream Length=1000000
=====================================================
Method				mean		stdev
-----------------------------------------------------
copy(i, o)			4396		310
copyLarge(i, o, arr)		4483	        420
copyLarge(i, o, tl.get())	4646	         87
Diff tl/arr	1.04x
Diff tl/plain	1.06x
=====================================================
{noformat}

Reading a 3MB large file into memory:
{noformat}
=====================================================
Method				mean	stdev
-----------------------------------------------------
copy(i, o)		        238		4
copyLarge(i, o, arr)		248	        4
copyLarge(i, o, tl.get())	250	        4
Diff tl/arr	1.01x
Diff tl/plain	1.05x
=====================================================
{noformat}

It is obvious that the performance depends whether the stream copying is IO-bound or not.
Even though I did take care of warm-up runs, the noise during the execution can affect performance
quite a lot as you can see from the standard deviation and the fact that sometimes the ThreadLocal
verions is faster, sometimes the array version. So I would not trust my own benchmark too
much in this regard but I just wanted to quickly disprove your benchmark.

The reason you see such amazing speedups is simply because you do not copy the streams correctly:

{code}
        @Override
        public void run()
        {
            for(int i = 0; i < runs; i++){
                try {
                    IOUtils.copy(inputStream, outputStream);
                } catch (IOException e) {
                    System.err.println(e.getMessage());
                }
            }
        }
{code}

You just call copy again and again on the same streams, but not resetting or re-initializing
them properly again. This basically means that after the first copy, all subsequent calls
immediately return as the input stream is already exhausted. So the test results are just
wrong.

> Avoid allocating memory for method internal buffers, use threadlocal memory instead
> -----------------------------------------------------------------------------------
>
>                 Key: IO-468
>                 URL: https://issues.apache.org/jira/browse/IO-468
>             Project: Commons IO
>          Issue Type: Improvement
>          Components: Utilities
>    Affects Versions: 2.4
>         Environment: all environments
>            Reporter: Bernd Hopp
>            Priority: Minor
>              Labels: newbie, performance
>             Fix For: 2.5
>
>         Attachments: PerfTest.java, monitoring_with_threadlocals.png, monitoring_without_threadlocals.png,
performancetest.ods, performancetest_weakreference.ods
>
>   Original Estimate: 12h
>  Remaining Estimate: 12h
>
> In a lot of places, we allocate new buffers dynamically via new byte[]. This is a performance
drawback since many of these allocations could be avoided if we would use threadlocal buffers
that can be reused. For example, consider the following code from IOUtils.java, ln 2177:
> return copyLarge(input, output, inputOffset, length, new byte[DEFAULT_BUFFER_SIZE]);
> This code allocates new memory for every copy-process, that is not used outside of the
method and could easily and safely reused, as long as is is thread-local. So instead of allocating
new memory, a new utility-class could provide a thread-local bytearray like this:
> byte[] buffer = ThreadLocalByteArray.ofSize(DEFAULT_BUFFER_SIZE);
> return copyLarge(input, output, inputOffset, length, buffer);
> I have not measured the performance-benefits yet, but I would expect them to be significant,
especially when the streams itself are not the performance bottleneck. 
> Git PR is at https://github.com/apache/commons-io/pull/6/files



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message