lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Pasha Bizhan" <fc...@ok.ru>
Subject RE: Not entire document being indexed?
Date Fri, 25 Feb 2005 23:32:25 GMT
Hi, 

> From: amigo@max3d.com [mailto:amigo@max3d.com] 

> Or perhaps someone can enlighten me on how to use Luke to find out if the
whole document was indexed or not.

Luke can help you to give an answer the question: does my index contain a
correct data?

Let do the following steps:
- run Luke
- open the index
- find the specified document (document tab)
- click "reconstruct and edit" button
- select the field and look the original stored content of this field
reconstructed from index

Does this reconstructed content contain your last 2-3 paragraphs?

Also, 230Kb is not equal 20.000. Try to set  writer.maxFieldLength to 250
000.

Pasha Bizhan
http://lucenedotnet.com

> > For example one document which is about 230KB in size when 
> converted 
> > to plain text, when indexed and later searched for a pharse in the 
> > last 2-3 paragraphs returns no hits, yet searching anything above 
> > those paragraphs works just fine. WordExtractor does convert the 
> > entire document to text, I've checked that.
> >
> > I've tried increasing the number of terms per field from default 
> > 10,000 to 20,000 with writer.maxFieldLength but that didnt make any 
> > difference, still cant find phrases from the last 2-3 paragraphs.
> >
> > Any ideas as to why this could be happening and how I could 
> rectify it?
> >
> >
> > thanks,
> >
> > -pedja


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message