lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steven Bell <sfb...@keasdesign.net>
Subject Re: MS-Word docs.
Date Mon, 28 Nov 2005 04:28:03 GMT
I did run down the issue. And it's a case of tired coder. I wasn't 
creating a new document object in the method I was using to handle word 
documents.

Thanks very much for the links guys, I appreciate it!

steve.

Chris Hostetter wrote:

>: I dump the doc files into a text file with the same variable I use in
>: the Lucene doc.add(Field.UnStored("content", textStr));| and they look
>: fine in the file. However searches return nothing.
>
>if i'm reading that sentence correctly, then you are saying that you've
>tried isolating your MS-Word text extraction from your Lucene indexing and
>confirmed that the MS-Word text extraction is working.
>
>This is a very good first step.
>
>Now if you want to be certain that your indexing is working correclty, i
>suggest you try using a tool like Luke to check exactly what terms re
>being stored in your index.  I'm guessing that the problems you are having
>are a result of "analysis paralysis" as i've seen it called many times...
>
>This wiki covers a lot of things you should check...
>
>http://wiki.apache.org/jakarta-lucene/LuceneFAQ#head-3558e5121806fb4fce80fc022d889484a9248b71
>
>
>
>-Hoss
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>  
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message