lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "amigo@max3d.com" <am...@max3d.com>
Subject Re: Not entire document being indexed?
Date Fri, 25 Feb 2005 23:08:59 GMT
Anyone else has any ideas why wouldn't the whole documents be indexed as 
described below?

Or perhaps someone can enlighten me on how to use Luke to find out if 
the whole document was indexed or not.
I have not used Luke in such capacity before so not sure what to do or 
look for?

thanks

-pedja


amigo@max3d.com said the following on 2/24/2005 2:08 PM:

> Hi everyone
>
> I'm having a bizzare problem with a few of the documents here that do 
> not seem to get indexed entirely.
>
> I use textmining WordExtractor to convert M$ Word to plain text and 
> then index that text.
> For example one document which is about 230KB in size when converted 
> to plain text, when indexed and
> later searched for a pharse in the last 2-3 paragraphs returns no 
> hits, yet searching anything above those
> paragraphs works just fine. WordExtractor does convert the entire 
> document to text, I've checked that.
>
> I've tried increasing the number of terms per field from default 
> 10,000 to 20,000 with writer.maxFieldLength
> but that didnt make any difference, still cant find phrases from the 
> last 2-3 paragraphs.
>
> Any ideas as to why this could be happening and how I could rectify it?
>
>
> thanks,
>
> -pedja
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message