Return-Path: X-Original-To: apmail-uima-user-archive@www.apache.org Delivered-To: apmail-uima-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 98EE91850 for ; Wed, 20 Apr 2011 13:01:07 +0000 (UTC) Received: (qmail 75907 invoked by uid 500); 20 Apr 2011 13:01:07 -0000 Delivered-To: apmail-uima-user-archive@uima.apache.org Received: (qmail 75884 invoked by uid 500); 20 Apr 2011 13:01:07 -0000 Mailing-List: contact user-help@uima.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@uima.apache.org Delivered-To: mailing list user@uima.apache.org Received: (qmail 75876 invoked by uid 99); 20 Apr 2011 13:01:07 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 20 Apr 2011 13:01:07 +0000 X-ASF-Spam-Status: No, hits=0.0 required=5.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of twgoetz@gmx.de designates 213.165.64.23 as permitted sender) Received: from [213.165.64.23] (HELO mailout-de.gmx.net) (213.165.64.23) by apache.org (qpsmtpd/0.29) with SMTP; Wed, 20 Apr 2011 13:01:00 +0000 Received: (qmail invoked by alias); 20 Apr 2011 13:00:37 -0000 Received: from deibp9eh1--blueice3n2.emea.ibm.com (EHLO [9.152.14.84]) [195.212.29.180] by mail.gmx.net (mp037) with SMTP; 20 Apr 2011 15:00:37 +0200 X-Authenticated: #25330878 X-Provags-ID: V01U2FsdGVkX1+YgrBAGHmYauD3FbhGVbd2+UNGMGZrvKlHZUrlNF 48DtUNPzt8JXOn Message-ID: <4DAED8EC.8070005@gmx.de> Date: Wed, 20 Apr 2011 15:00:28 +0200 From: =?UTF-8?B?VGhpbG8gR8O2dHo=?= User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.17) Gecko/20110414 Lightning/1.0b2 Thunderbird/3.1.10 MIME-Version: 1.0 To: user@uima.apache.org Subject: Re: CR+LF = 1 character? References: In-Reply-To: X-Enigmail-Version: 1.1.1 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Y-GMX-Trusted: 0 On 4/20/2011 14:31, Steven Bethard wrote: > On Wed, Apr 20, 2011 at 10:58 AM, Jens Grivolla wrote: >> As it turns out, the other system considers CR+LF (Windows style line >> endings) to be two characters, while UIMA sees it as one. > > As Jörn suggested, this is probably a bug in the code somewhere where > you read in the text. Perhaps you're using > org.apache.uima.pear.util.FileUtil.loadTextFile? That's definitely > broken in terms of line endings and I know that gave us trouble > before. We found that org.apache.uima.util.FileUtils.file2String > actually does the right thing, so you could use that instead. Having > been bitten by this though, I tend to avoid the UIMA classes for > handling files, and use com.google.common.io.Files.toString from the > guava libraries instead, which I trust more. This is getting slightly off-topic, but you can also use Apache Commons IO for this sort of thing. Although I resent having the UIMA core file utils lumped in with the pear stuff, I can't blame you for your conclusion ;-) --Thilo > > Steve > > P.S. Yes, I know I should have filed a bug report. Sorry for not > getting around to it...