lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Hatcher <e...@ehatchersolutions.com>
Subject Re: failure in the Russian Analyzer in contrib
Date Fri, 11 Feb 2005 18:24:40 GMT
On Feb 11, 2005, at 11:19 AM, Vanlerberghe, Luc wrote:
> I'm suspecting subversion now: the stemsUnicode.txt and 
> wordsUnicode.txt
> files are encoded in UTF-16 (they have the proper two byte byte-order
> prefix) and have property svn:eol-style set to native.
> On my (Windows :( )system the files are 904424 and 1101164 bytes long
> and are full of "0d 0a 00" byte sequences which in unicode should
> probably just be "0a 00" or "0d 00 0a 00".

My files have these sizes:

$ ls -l
total 3608
-rw-r--r--  1 erik  erik   805080 11 Feb 08:30 stemsUnicode.txt
-rw-r--r--  1 erik  erik  1001820 11 Feb 08:30 wordsUnicode.txt

> Is there a way to do a svn update --raw or something that I can check
> this?

No, svn doesn't have this type of switch.

> If this is indeed the problem, a possible fix would be to set the
> svn:eol-style to LF or else let svn know that the file is in unicode
> (perhaps setting the svn:mime-type property to something else than the
> default?)

I have set the svn:eol-style property to LF on both of those files.  
Let me know if that fixes the issue.

	Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Mime
View raw message