commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benedikt Ritter <benerit...@googlemail.com>
Subject Re: [csv] Performance comparison
Date Mon, 12 Mar 2012 16:03:56 GMT
Am 12. März 2012 11:31 schrieb Emmanuel Bourg <ebourg@apache.org>:
> I have identified the performance killer, it's the ExtendedBufferedReader.
> It implements a complex logic to fetch one character ahead, but this extra
> character is rarely used. I have implemented a simpler look ahead using
> mark/reset as suggested by Bob Smith in CSV-42 and the performance improved
> by 30%.
>
> Now the parsing is down to 3406 ms, and that's almost without touching the
> parser yet.
>

great work Emmanuel!

looking at my profiler, I can say that 70% of the time is spend in
ExtendedBufferedReader.read(). This is no wonder, since read() is the
method that does the actual work. However, we should try to minimize
accesses to read(). For example isEndOfLine() calls read() two times.
And isEndOfLine() get's called 5 times by CSVLexer.nextToken() and
it's submethods.
The hole logic behind CSVLexer.nextToken() is very hard to read
(IMHO). Maybe a some refactoring would help to make it easier to
identify bottle necks?

Benedikt

> Emmanuel Bourg
>
>
> Le 11/03/2012 15:05, Emmanuel Bourg a écrit :
>
>> Hi,
>>
>> I compared the performance of Commons CSV with the other CSV parsers
>> available. I took the world cities file from Maxmind as a test file [1],
>> it's a big file of 130M with 2.8 million records.
>>
>> Here are the results obtained on a Core 2 Duo E8400 after several
>> iterations to let the JIT compiler kick in:
>>
>> Direct read 750 ms
>> Java CSV 3328 ms
>> Super CSV 3562 ms (+7%)
>> OpenCSV 3609 ms (+8.4%)
>> GenJava CSV 3844 ms (+15.5%)
>> Commons CSV 4656 ms (+39.9%)
>> Skife CSV 4813 ms (+44.6%)
>>
>> I also tried Nuiton CSV and Esperio CSV but I couldn't figure how to use
>> them.
>>
>> I haven't analyzed why Commons CSV is slower yet, but it seems there is
>> room for improvements. The memory usage will have to be compared too,
>> I'm looking for a way to measure it.
>>
>>
>> Emmanuel Bourg
>>
>> [1] http://www.maxmind.com/download/worldcities/worldcitiespop.txt.gz
>>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Mime
View raw message