lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "" <>
Subject Not entire document being indexed?
Date Thu, 24 Feb 2005 19:08:07 GMT
Hi everyone

I'm having a bizzare problem with a few of the documents here that do 
not seem to get indexed entirely.

I use textmining WordExtractor to convert M$ Word to plain text and then 
index that text.
For example one document which is about 230KB in size when converted to 
plain text, when indexed and
later searched for a pharse in the last 2-3 paragraphs returns no hits, 
yet searching anything above those
paragraphs works just fine. WordExtractor does convert the entire 
document to text, I've checked that.

I've tried increasing the number of terms per field from default 10,000 
to 20,000 with writer.maxFieldLength
but that didnt make any difference, still cant find phrases from the 
last 2-3 paragraphs.

Any ideas as to why this could be happening and how I could rectify it?



To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message