lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Hatcher <>
Subject Re: retrieve tokens
Date Wed, 22 Dec 2004 18:49:20 GMT
On Dec 22, 2004, at 12:43 PM, M. Smit wrote:
> Erik Hatcher wrote:
> But for the other issue on 'store lucene' vs 'store db'. Does anyone 
> can provide me with some field experience on size?
> The system I'm developing will provide searching through some 2000 
> pdf's, say some 200 pages each. I feed the plain text into Lucene on a 
> Field.UnStored bases. I also store this plain text in the database for 
> the sole purpose of presenting a context snippet.
> If I were to use the Highlighter with a Field.Text, I will not use the 
> database plain part altogether. But still I'm a little worried about 
> speed/space issues. Or am I just seeing bears-on-the-road (Dutch 
> saying, in plain English: making a fuzz about nothing)..

Consider that you're only highlighting 20 or so entries at one time.  
Getting the text from a Lucene index you're already navigating will be 
quite quick.  But it shouldn't be too bad to pull 20 records from a 
database either.

There is one other consideration, and that is to use the new (CVS only) 
feature of capturing term vectors with position information.  The 
author of the Highlighter, Mark Harwood, has posted in the not too 
distant past, an update to the Highlighter that can use this position 
information for highlighting rather than re-analyzing the original 
text.  The re-analysis of the text may be the bottleneck, not the 
database access.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message