commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From James Carman <jcar...@carmanconsulting.com>
Subject Re: [csv] Performance comparison
Date Mon, 12 Mar 2012 16:28:47 GMT
Would one of the parser libraries not work here?
On Mar 12, 2012 12:22 PM, "Emmanuel Bourg" <ebourg@apache.org> wrote:

> Le 12/03/2012 17:03, Benedikt Ritter a écrit :
>
>  The hole logic behind CSVLexer.nextToken() is very hard to read
>> (IMHO). Maybe a some refactoring would help to make it easier to
>> identify bottle necks?
>>
>
> Yes I started investigating in this direction. I filed a few bugs
> regarding the behavior of the escaping that aim at clarifying the parser.
>
> I think the nextToken() method should be broken into smaller methods to
> help the JIT compiler.
>
> The JIT does some surprising things, I found that even unused code
> branches can have an impact on the performance. For example if
> simpleTokenLexer() is changed to not support escaped characters, the
> performance improves by 10% (the input has no escaped character). And
> that's not merely because an if statement was removed. If I add a
> System.out.println() in this if block that is never called, the performance
> improves as well.
>
> So any change to the parser will have to be carefully tested. Innocent
> changes can have a significant impact.
>
>
> Emmanuel Bourg
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message