commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ortwin Glück (JIRA) <>
Subject [jira] Commented: (SANDBOX-166) Improve memory use
Date Wed, 09 Aug 2006 12:26:14 GMT
    [ ] 
Ortwin Glück commented on SANDBOX-166:

The most important optimization is to reuse Token (and their StringBuffer) objects.

CSV files usually contain the same number of columns throughout the file. The parser should
adapt itself dynamically after the first line and size its internal arrays correctly. Also
the columns have maximum lengths. The parser should adapt itself dynamically and size it's
StringBuffers correctly.

Because of JDK 1.3 compatibility there is StringBuffer.append(StringBuffer.toString()) which
copies data twice. Using a better character buffer can alleviate the problem.

 getLine(): String[0] is immutable and should be a constant.  Token objects should be reused!

 nextToken(): reuse intermediate StringBuffer wsBuf. Don't create a new instance on every

 simpleTokenLexer(): reuse intermediate StringBuffer wsBuf. Don't create a new instance on
every call

I'll attach a patch that addresses those issues shortly.

> Improve memory use
> ------------------
>                 Key: SANDBOX-166
>                 URL:
>             Project: Commons Sandbox
>          Issue Type: Improvement
>          Components: CSV
>    Affects Versions: Nightly Builds
>            Reporter: Ortwin Glück
>         Attachments: profile.png
> The parser is currently a real memory burner. I fed it a 4MB CSV file and ran the TPTP
profiler.  As you can see the parser creates around 100MB of garbage whereas it could (in
really optimized) use around 4MBs.  Such figures are not acceptable within a server environment.
Please attach insights and patches to this issue report.

This message is automatically generated by JIRA.
If you think it was sent incorrectly contact one of the administrators:
For more information on JIRA, see:


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message