commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Emmanuel Bourg <ebo...@apache.org>
Subject Re: [csv] Performance comparison
Date Mon, 12 Mar 2012 10:31:38 GMT
I have identified the performance killer, it's the 
ExtendedBufferedReader. It implements a complex logic to fetch one 
character ahead, but this extra character is rarely used. I have 
implemented a simpler look ahead using mark/reset as suggested by Bob 
Smith in CSV-42 and the performance improved by 30%.

Now the parsing is down to 3406 ms, and that's almost without touching 
the parser yet.

Emmanuel Bourg


Le 11/03/2012 15:05, Emmanuel Bourg a écrit :
> Hi,
>
> I compared the performance of Commons CSV with the other CSV parsers
> available. I took the world cities file from Maxmind as a test file [1],
> it's a big file of 130M with 2.8 million records.
>
> Here are the results obtained on a Core 2 Duo E8400 after several
> iterations to let the JIT compiler kick in:
>
> Direct read 750 ms
> Java CSV 3328 ms
> Super CSV 3562 ms (+7%)
> OpenCSV 3609 ms (+8.4%)
> GenJava CSV 3844 ms (+15.5%)
> Commons CSV 4656 ms (+39.9%)
> Skife CSV 4813 ms (+44.6%)
>
> I also tried Nuiton CSV and Esperio CSV but I couldn't figure how to use
> them.
>
> I haven't analyzed why Commons CSV is slower yet, but it seems there is
> room for improvements. The memory usage will have to be compared too,
> I'm looking for a way to measure it.
>
>
> Emmanuel Bourg
>
> [1] http://www.maxmind.com/download/worldcities/worldcitiespop.txt.gz
>



Mime
View raw message