opennlp-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jörn Kottmann <kottm...@gmail.com>
Subject Re: svn commit: r1564379 [1/2] - in /opennlp/trunk/opennlp-tools/src: main/java/opennlp/tools/cmdline/ main/java/opennlp/tools/cmdline/chunker/ main/java/opennlp/tools/cmdline/doccat/ main/java/opennlp/tools/cmdline/namefind/ main/java/opennlp/tools/cmdlin...
Date Thu, 06 Feb 2014 13:27:14 GMT
On 02/04/2014 06:10 PM, markg@apache.org wrote:
> +++ opennlp/trunk/opennlp-tools/src/main/java/opennlp/tools/cmdline/tokenizer/CommandLineTokenizer.java
Tue Feb  4 17:10:11 2014

<SNIP>

>     void process() {
> -
> -    ObjectStream<String> untokenizedLineStream =
> -        new PlainTextByLineStream(new InputStreamReader(System.in));
> -
> -    ObjectStream<String> tokenizedLineStream = new WhitespaceTokenStream(
> -        new TokenizerStream(tokenizer, untokenizedLineStream));
> -
> -    PerformanceMonitor perfMon = new PerformanceMonitor(System.err, "sent");
> -    perfMon.start();
> -
> +    ObjectStream<String> untokenizedLineStream = null;
> +
> +    ObjectStream<String> tokenizedLineStream = null;
> +    PerformanceMonitor perfMon = null;
>       try {
> +      untokenizedLineStream =
> +              new PlainTextByLineStream(new MockInputStreamFactory(System.in), "UTF-8");

The encoding should not be changed. To read from System.in the default 
encoding should be used, and not UTF-8.
As far as I know that will not work on Windows.

Jörn

Mime
View raw message