lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <gsing...@apache.org>
Subject Re: Problem using Lucene on Ubuntu
Date Mon, 18 Feb 2008 14:22:12 GMT
Good point Jan!

On Feb 18, 2008, at 9:00 AM, Jan Peter Stotz wrote:

> Grant Ingersoll wrote:
>
>> Note: ENCODING is whatever encoding the file is in, as in "UTF-8",  
>> if that is what your files are in.
>
> I think there is a misunderstanding, the WordExtractor extracts text  
> from MS Word (.doc) files. Those files are binary and therefore does  
> not have an encoding.
> I would print out the extracted text into a plain text files and  
> compare if there are differences between the file generated on  
> Windows and Linux/Ubuntu. This allows to determine if this is a  
> WordExtractor or a Lucene problem.
>
> Jan
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message