commons-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From sebb <seb...@gmail.com>
Subject Re: Commons IO Tailer does not respect UTF-8 Charset
Date Fri, 26 Oct 2012 23:26:04 GMT
On 26 October 2012 23:03, Liyu Yi <liyuyi@gmail.com> wrote:
> I just realized there is a defect in the source code of
> "org.apache.commons.io.input.Tailer.java".

Thanks for the feedback.

Bug fixes are better provided as JIRA bugs.

It's hard to keep track of bug reports and patches when they are mixed
in with all the other mailing list traffic.

Could you create a JIRA issue for this problem please?

Also, in future, please do not cross-post to both the developer and user lists.
The developers follow the user list.

> Basically, the current
> implementation does not work for multi-byte encoded files. See the
> following snippet,
>
> 448 <http://grepcode.com/file/repo1.maven.org/maven2/commons-io/commons-io/2.4/org/apache/commons/io/input/Tailer.java#448>
>
> <http://grepcode.com/file/repo1.maven.org/maven2/commons-io/commons-io/2.4/org/apache/commons/io/input/Tailer.java#>
>
>     private long
> <http://grepcode.com/file/repo1.maven.org/maven2/commons-io/commons-io/2.4/org/apache/commons/io/input/Tailer.java#>readLines(RandomAccessFile
> <http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/io/RandomAccessFile.java#RandomAccessFile>
> reader) throws IOException
> <http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/io/IOException.java#IOException>
> {
>
> 449 <http://grepcode.com/file/repo1.maven.org/maven2/commons-io/commons-io/2.4/org/apache/commons/io/input/Tailer.java#449>
>
> <http://grepcode.com/file/repo1.maven.org/maven2/commons-io/commons-io/2.4/org/apache/commons/io/input/Tailer.java#>
>
>         StringBuilder
> <http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/lang/StringBuilder.java#StringBuilder>
> sb = new StringBuilder
> <http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/lang/StringBuilder.java#StringBuilder>();
>
> 450 <http://grepcode.com/file/repo1.maven.org/maven2/commons-io/commons-io/2.4/org/apache/commons/io/input/Tailer.java#450>
>
> <http://grepcode.com/file/repo1.maven.org/maven2/commons-io/commons-io/2.4/org/apache/commons/io/input/Tailer.java#>
>
> 451 <http://grepcode.com/file/repo1.maven.org/maven2/commons-io/commons-io/2.4/org/apache/commons/io/input/Tailer.java#451>
>
> <http://grepcode.com/file/repo1.maven.org/maven2/commons-io/commons-io/2.4/org/apache/commons/io/input/Tailer.java#>
>
>         long pos = reader.getFilePointer
> <http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/io/RandomAccessFile.java#RandomAccessFile.getFilePointer%28%29>();
>
> 452 <http://grepcode.com/file/repo1.maven.org/maven2/commons-io/commons-io/2.4/org/apache/commons/io/input/Tailer.java#452>
>
> <http://grepcode.com/file/repo1.maven.org/maven2/commons-io/commons-io/2.4/org/apache/commons/io/input/Tailer.java#>
>
>         long rePos = pos; // position to re-read
>
> 453 <http://grepcode.com/file/repo1.maven.org/maven2/commons-io/commons-io/2.4/org/apache/commons/io/input/Tailer.java#453>
>
> <http://grepcode.com/file/repo1.maven.org/maven2/commons-io/commons-io/2.4/org/apache/commons/io/input/Tailer.java#>
>
> 454 <http://grepcode.com/file/repo1.maven.org/maven2/commons-io/commons-io/2.4/org/apache/commons/io/input/Tailer.java#454>
>
> <http://grepcode.com/file/repo1.maven.org/maven2/commons-io/commons-io/2.4/org/apache/commons/io/input/Tailer.java#>
>
>         int num;
>
> 455 <http://grepcode.com/file/repo1.maven.org/maven2/commons-io/commons-io/2.4/org/apache/commons/io/input/Tailer.java#455>
>
> <http://grepcode.com/file/repo1.maven.org/maven2/commons-io/commons-io/2.4/org/apache/commons/io/input/Tailer.java#>
>
>         boolean seenCR = false;
>
> 456 <http://grepcode.com/file/repo1.maven.org/maven2/commons-io/commons-io/2.4/org/apache/commons/io/input/Tailer.java#456>
>
> <http://grepcode.com/file/repo1.maven.org/maven2/commons-io/commons-io/2.4/org/apache/commons/io/input/Tailer.java#>
>
>         while (run
> <http://grepcode.com/file/repo1.maven.org/maven2/commons-io/commons-io/2.4/org/apache/commons/io/input/Tailer.java#Tailer.0run>
> && ((num = reader.read
> <http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/io/RandomAccessFile.java#RandomAccessFile.read%28byte%5B%5D%29>(inbuf
> <http://grepcode.com/file/repo1.maven.org/maven2/commons-io/commons-io/2.4/org/apache/commons/io/input/Tailer.java#Tailer.0inbuf>))
> != -1)) {
>
> 457 <http://grepcode.com/file/repo1.maven.org/maven2/commons-io/commons-io/2.4/org/apache/commons/io/input/Tailer.java#457>
>
> <http://grepcode.com/file/repo1.maven.org/maven2/commons-io/commons-io/2.4/org/apache/commons/io/input/Tailer.java#>
>
>             for (int i = 0; i < num; i++) {
>
> 458 <http://grepcode.com/file/repo1.maven.org/maven2/commons-io/commons-io/2.4/org/apache/commons/io/input/Tailer.java#458>
>
> <http://grepcode.com/file/repo1.maven.org/maven2/commons-io/commons-io/2.4/org/apache/commons/io/input/Tailer.java#>
>
>                 byte ch = inbuf
> <http://grepcode.com/file/repo1.maven.org/maven2/commons-io/commons-io/2.4/org/apache/commons/io/input/Tailer.java#Tailer.0inbuf>[i];
>
> 459 <http://grepcode.com/file/repo1.maven.org/maven2/commons-io/commons-io/2.4/org/apache/commons/io/input/Tailer.java#459>
>
> <http://grepcode.com/file/repo1.maven.org/maven2/commons-io/commons-io/2.4/org/apache/commons/io/input/Tailer.java#>
>
>                 switch (ch) {
>
> 460 <http://grepcode.com/file/repo1.maven.org/maven2/commons-io/commons-io/2.4/org/apache/commons/io/input/Tailer.java#460>
>
> <http://grepcode.com/file/repo1.maven.org/maven2/commons-io/commons-io/2.4/org/apache/commons/io/input/Tailer.java#>
>
>                 case '\n':
>
> 461 <http://grepcode.com/file/repo1.maven.org/maven2/commons-io/commons-io/2.4/org/apache/commons/io/input/Tailer.java#461>
>
> <http://grepcode.com/file/repo1.maven.org/maven2/commons-io/commons-io/2.4/org/apache/commons/io/input/Tailer.java#>
>
>                     seenCR = false; // swallow CR before LF
>
> 462 <http://grepcode.com/file/repo1.maven.org/maven2/commons-io/commons-io/2.4/org/apache/commons/io/input/Tailer.java#462>
>
> <http://grepcode.com/file/repo1.maven.org/maven2/commons-io/commons-io/2.4/org/apache/commons/io/input/Tailer.java#>
>
>                     listener
> <http://grepcode.com/file/repo1.maven.org/maven2/commons-io/commons-io/2.4/org/apache/commons/io/input/Tailer.java#Tailer.0listener>.handle
> <http://grepcode.com/file/repo1.maven.org/maven2/commons-io/commons-io/2.4/org/apache/commons/io/input/TailerListener.java#TailerListener.handle%28java.lang.String%29>(sb.toString
> <http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/lang/StringBuilder.java#StringBuilder.toString%28%29>());
>
> 463 <http://grepcode.com/file/repo1.maven.org/maven2/commons-io/commons-io/2.4/org/apache/commons/io/input/Tailer.java#463>
>
> <http://grepcode.com/file/repo1.maven.org/maven2/commons-io/commons-io/2.4/org/apache/commons/io/input/Tailer.java#>
>
>                     sb.setLength
> <http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/lang/AbstractStringBuilder.java#AbstractStringBuilder.setLength%28int%29>(0);
>
> 464 <http://grepcode.com/file/repo1.maven.org/maven2/commons-io/commons-io/2.4/org/apache/commons/io/input/Tailer.java#464>
>
> <http://grepcode.com/file/repo1.maven.org/maven2/commons-io/commons-io/2.4/org/apache/commons/io/input/Tailer.java#>
>
>                     rePos = pos + i + 1;
>
> 465 <http://grepcode.com/file/repo1.maven.org/maven2/commons-io/commons-io/2.4/org/apache/commons/io/input/Tailer.java#465>
>
> <http://grepcode.com/file/repo1.maven.org/maven2/commons-io/commons-io/2.4/org/apache/commons/io/input/Tailer.java#>
>
>                     break;
>
> 466 <http://grepcode.com/file/repo1.maven.org/maven2/commons-io/commons-io/2.4/org/apache/commons/io/input/Tailer.java#466>
>
> <http://grepcode.com/file/repo1.maven.org/maven2/commons-io/commons-io/2.4/org/apache/commons/io/input/Tailer.java#>
>
>                 case '\r':
>
> 467 <http://grepcode.com/file/repo1.maven.org/maven2/commons-io/commons-io/2.4/org/apache/commons/io/input/Tailer.java#467>
>
> <http://grepcode.com/file/repo1.maven.org/maven2/commons-io/commons-io/2.4/org/apache/commons/io/input/Tailer.java#>
>
>                     if (seenCR) {
>
> 468 <http://grepcode.com/file/repo1.maven.org/maven2/commons-io/commons-io/2.4/org/apache/commons/io/input/Tailer.java#468>
>
> <http://grepcode.com/file/repo1.maven.org/maven2/commons-io/commons-io/2.4/org/apache/commons/io/input/Tailer.java#>
>
>                         sb.append
> <http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/lang/StringBuilder.java#StringBuilder.append%28char%29>('\r');
>
> 469 <http://grepcode.com/file/repo1.maven.org/maven2/commons-io/commons-io/2.4/org/apache/commons/io/input/Tailer.java#469>
>
> <http://grepcode.com/file/repo1.maven.org/maven2/commons-io/commons-io/2.4/org/apache/commons/io/input/Tailer.java#>
>
>                     }
>
> 470 <http://grepcode.com/file/repo1.maven.org/maven2/commons-io/commons-io/2.4/org/apache/commons/io/input/Tailer.java#470>
>
> <http://grepcode.com/file/repo1.maven.org/maven2/commons-io/commons-io/2.4/org/apache/commons/io/input/Tailer.java#>
>
>                     seenCR = true;
>
> 471 <http://grepcode.com/file/repo1.maven.org/maven2/commons-io/commons-io/2.4/org/apache/commons/io/input/Tailer.java#471>
>
> <http://grepcode.com/file/repo1.maven.org/maven2/commons-io/commons-io/2.4/org/apache/commons/io/input/Tailer.java#>
>
>                     break;
>
> 472 <http://grepcode.com/file/repo1.maven.org/maven2/commons-io/commons-io/2.4/org/apache/commons/io/input/Tailer.java#472>
>
> <http://grepcode.com/file/repo1.maven.org/maven2/commons-io/commons-io/2.4/org/apache/commons/io/input/Tailer.java#>
>
>                 default:
>
> 473 <http://grepcode.com/file/repo1.maven.org/maven2/commons-io/commons-io/2.4/org/apache/commons/io/input/Tailer.java#473>
>
> <http://grepcode.com/file/repo1.maven.org/maven2/commons-io/commons-io/2.4/org/apache/commons/io/input/Tailer.java#>
>
>                     if (seenCR) {
>
> 474 <http://grepcode.com/file/repo1.maven.org/maven2/commons-io/commons-io/2.4/org/apache/commons/io/input/Tailer.java#474>
>
> <http://grepcode.com/file/repo1.maven.org/maven2/commons-io/commons-io/2.4/org/apache/commons/io/input/Tailer.java#>
>
>                         seenCR = false; // swallow final CR
>
> 475 <http://grepcode.com/file/repo1.maven.org/maven2/commons-io/commons-io/2.4/org/apache/commons/io/input/Tailer.java#475>
>
> <http://grepcode.com/file/repo1.maven.org/maven2/commons-io/commons-io/2.4/org/apache/commons/io/input/Tailer.java#>
>
>                         listener
> <http://grepcode.com/file/repo1.maven.org/maven2/commons-io/commons-io/2.4/org/apache/commons/io/input/Tailer.java#Tailer.0listener>.handle
> <http://grepcode.com/file/repo1.maven.org/maven2/commons-io/commons-io/2.4/org/apache/commons/io/input/TailerListener.java#TailerListener.handle%28java.lang.String%29>(sb.toString
> <http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/lang/StringBuilder.java#StringBuilder.toString%28%29>());
>
> 476 <http://grepcode.com/file/repo1.maven.org/maven2/commons-io/commons-io/2.4/org/apache/commons/io/input/Tailer.java#476>
>
> <http://grepcode.com/file/repo1.maven.org/maven2/commons-io/commons-io/2.4/org/apache/commons/io/input/Tailer.java#>
>
>                         sb.setLength
> <http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/lang/AbstractStringBuilder.java#AbstractStringBuilder.setLength%28int%29>(0);
>
> 477 <http://grepcode.com/file/repo1.maven.org/maven2/commons-io/commons-io/2.4/org/apache/commons/io/input/Tailer.java#477>
>
> <http://grepcode.com/file/repo1.maven.org/maven2/commons-io/commons-io/2.4/org/apache/commons/io/input/Tailer.java#>
>
>                         rePos = pos + i + 1;
>
> 478 <http://grepcode.com/file/repo1.maven.org/maven2/commons-io/commons-io/2.4/org/apache/commons/io/input/Tailer.java#478>
>
> <http://grepcode.com/file/repo1.maven.org/maven2/commons-io/commons-io/2.4/org/apache/commons/io/input/Tailer.java#>
>
>                     }
>
> 479 <http://grepcode.com/file/repo1.maven.org/maven2/commons-io/commons-io/2.4/org/apache/commons/io/input/Tailer.java#479>
>
> <http://grepcode.com/file/repo1.maven.org/maven2/commons-io/commons-io/2.4/org/apache/commons/io/input/Tailer.java#>
>
>                     sb.append
> <http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/lang/StringBuilder.java#StringBuilder.append%28char%29>((char)
> ch); // add character, not its ascii value
>
> 480 <http://grepcode.com/file/repo1.maven.org/maven2/commons-io/commons-io/2.4/org/apache/commons/io/input/Tailer.java#480>
>
> <http://grepcode.com/file/repo1.maven.org/maven2/commons-io/commons-io/2.4/org/apache/commons/io/input/Tailer.java#>
>
>                 }
>
> 481 <http://grepcode.com/file/repo1.maven.org/maven2/commons-io/commons-io/2.4/org/apache/commons/io/input/Tailer.java#481>
>
> <http://grepcode.com/file/repo1.maven.org/maven2/commons-io/commons-io/2.4/org/apache/commons/io/input/Tailer.java#>
>
>             }
>
> 482 <http://grepcode.com/file/repo1.maven.org/maven2/commons-io/commons-io/2.4/org/apache/commons/io/input/Tailer.java#482>
>
> <http://grepcode.com/file/repo1.maven.org/maven2/commons-io/commons-io/2.4/org/apache/commons/io/input/Tailer.java#>
>
> 483 <http://grepcode.com/file/repo1.maven.org/maven2/commons-io/commons-io/2.4/org/apache/commons/io/input/Tailer.java#483>
>
> <http://grepcode.com/file/repo1.maven.org/maven2/commons-io/commons-io/2.4/org/apache/commons/io/input/Tailer.java#>
>
>             pos = reader.getFilePointer
> <http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/io/RandomAccessFile.java#RandomAccessFile.getFilePointer%28%29>();
>
> 484 <http://grepcode.com/file/repo1.maven.org/maven2/commons-io/commons-io/2.4/org/apache/commons/io/input/Tailer.java#484>
>
> <http://grepcode.com/file/repo1.maven.org/maven2/commons-io/commons-io/2.4/org/apache/commons/io/input/Tailer.java#>
>
>         }
>
> 485 <http://grepcode.com/file/repo1.maven.org/maven2/commons-io/commons-io/2.4/org/apache/commons/io/input/Tailer.java#485>
>
> <http://grepcode.com/file/repo1.maven.org/maven2/commons-io/commons-io/2.4/org/apache/commons/io/input/Tailer.java#>
>
> 486 <http://grepcode.com/file/repo1.maven.org/maven2/commons-io/commons-io/2.4/org/apache/commons/io/input/Tailer.java#486>
>
> <http://grepcode.com/file/repo1.maven.org/maven2/commons-io/commons-io/2.4/org/apache/commons/io/input/Tailer.java#>
>
>         reader.seek
> <http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/io/RandomAccessFile.java#RandomAccessFile.seek%28long%29>(rePos);
> // Ensure we can re-read if necessary
>
> 487 <http://grepcode.com/file/repo1.maven.org/maven2/commons-io/commons-io/2.4/org/apache/commons/io/input/Tailer.java#487>
>
> <http://grepcode.com/file/repo1.maven.org/maven2/commons-io/commons-io/2.4/org/apache/commons/io/input/Tailer.java#>
>
>         return rePos;
>
> 488 <http://grepcode.com/file/repo1.maven.org/maven2/commons-io/commons-io/2.4/org/apache/commons/io/input/Tailer.java#488>
>
> <http://grepcode.com/file/repo1.maven.org/maven2/commons-io/commons-io/2.4/org/apache/commons/io/input/Tailer.java#>
>
>     }
>
>
> At line 479, the conversion of byte to char types breaks the encoding.
>
> I used a "hacky" fix to reconstruct the String with right encoding in the
> handler class.
>
> private String rebuildUTF8String(String line) {
>
> int len = line.length();
>
> byte[] bytes = new byte[len];
>
> for (int i=0; i<len; i++) {
>
> bytes[i] = (byte)line.charAt(i);
>
> }
>
> return new String(bytes, UTF8);
>
> }
>
> However, the right approach is to pass in the encoding to the "create"
> method and handle it in the Tailer.
>
> Regards,
>
> Liyu Yi

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@commons.apache.org
For additional commands, e-mail: user-help@commons.apache.org


Mime
View raw message