uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thilo Götz <twgo...@gmx.de>
Subject Re: CR+LF = 1 character?
Date Wed, 20 Apr 2011 13:00:28 GMT
On 4/20/2011 14:31, Steven Bethard wrote:
> On Wed, Apr 20, 2011 at 10:58 AM, Jens Grivolla <j+asf@grivolla.net> wrote:
>> As it turns out, the other system considers CR+LF (Windows style line
>> endings) to be two characters, while UIMA sees it as one.
> 
> As Jörn suggested, this is probably a bug in the code somewhere where
> you read in the text. Perhaps you're using
> org.apache.uima.pear.util.FileUtil.loadTextFile? That's definitely
> broken in terms of line endings and I know that gave us trouble
> before. We found that org.apache.uima.util.FileUtils.file2String
> actually does the right thing, so you could use that instead. Having
> been bitten by this though, I tend to avoid the UIMA classes for
> handling files, and use com.google.common.io.Files.toString from the
> guava libraries instead, which I trust more.

This is getting slightly off-topic, but you can also use
Apache Commons IO for this sort of thing.

Although I resent having the UIMA core file utils lumped
in with the pear stuff, I can't blame you for your conclusion ;-)

--Thilo

> 
> Steve
> 
> P.S. Yes, I know I should have filed a bug report. Sorry for not
> getting around to it...

Mime
View raw message