commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Emmanuel Bourg <ebo...@apache.org>
Subject Re: [csv] Performance comparison
Date Tue, 13 Mar 2012 00:12:36 GMT
I kept tickling ExtendedBufferedReader and I have some interesting results.

First I tried to simplify it by extending java.io.LineNumberReader 
instead of BufferedReader. The performance decreased by 20%, probably 
because the class is synchronized internally.

But wait, isn't BufferedReader also synchronized? I copied the code of 
BufferedReader and removed the synchronized blocks. Now the time to 
parse the file is down to 2652 ms, 28% faster than previously!

Of course the code of BufferedReader can't be copied from the JDK due to 
the license mismatch, so I took the version from Harmony. On my test it 
is about 4% faster than the JDK counterpart, and the parsing time is now 
around 2553 ms.

Now Commons CSV can start claiming being the fastest CSV parser around :)

Emmanuel Bourg


Le 12/03/2012 11:31, Emmanuel Bourg a écrit :
> I have identified the performance killer, it's the
> ExtendedBufferedReader. It implements a complex logic to fetch one
> character ahead, but this extra character is rarely used. I have
> implemented a simpler look ahead using mark/reset as suggested by Bob
> Smith in CSV-42 and the performance improved by 30%.
>
> Now the parsing is down to 3406 ms, and that's almost without touching
> the parser yet.
>
> Emmanuel Bourg
>
>
> Le 11/03/2012 15:05, Emmanuel Bourg a écrit :
>> Hi,
>>
>> I compared the performance of Commons CSV with the other CSV parsers
>> available. I took the world cities file from Maxmind as a test file [1],
>> it's a big file of 130M with 2.8 million records.
>>
>> Here are the results obtained on a Core 2 Duo E8400 after several
>> iterations to let the JIT compiler kick in:
>>
>> Direct read 750 ms
>> Java CSV 3328 ms
>> Super CSV 3562 ms (+7%)
>> OpenCSV 3609 ms (+8.4%)
>> GenJava CSV 3844 ms (+15.5%)
>> Commons CSV 4656 ms (+39.9%)
>> Skife CSV 4813 ms (+44.6%)
>>
>> I also tried Nuiton CSV and Esperio CSV but I couldn't figure how to use
>> them.
>>
>> I haven't analyzed why Commons CSV is slower yet, but it seems there is
>> room for improvements. The memory usage will have to be compared too,
>> I'm looking for a way to measure it.
>>
>>
>> Emmanuel Bourg
>>
>> [1] http://www.maxmind.com/download/worldcities/worldcitiespop.txt.gz
>>
>
>



Mime
View raw message