commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benedikt Ritter <>
Subject [cvs] CSVLexer.isEndOfLine(int c) makes assumptions on the line separator of a CSVFormat
Date Mon, 12 Mar 2012 17:17:01 GMT

while looking for potential performance optimization I came across
CSVLexer.isEndOfLine(int c). Here is the source:

    private boolean isEndOfLine(int c) throws IOException {
        // check if we have \r\n...
        if (c == '\r' && in.lookAhead() == '\n') {
            // note: does not change c outside of this method !!
            c =;
        return (c == '\n' || c == '\r');

this method assumes, that a line separator will always be "\r" or
"\r\n". This is true for the pre-configured CSVFormats EXCEL, TDF and
MYSQL. I'm not a pro when it comes to file encoding, but isn't there
the possibility that new encodings will have different line
If that is the case, isEndOfLine() should somehow use
For example the lookAhead only has to be made, if
lineSeperator.length() > 1. This may have a positive impact on the
performance of parsing files with an encoding whose line separator is
only one char long.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message