lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <cutt...@apache.org>
Subject Re: [jira] Commented: (LUCENE-845) If you "flush by RAM usage" then IndexWriter may over-merge
Date Mon, 26 Mar 2007 18:18:03 GMT
Steven Parkes wrote:
> And what about Project Gutenburg?
> 
> Wikipedia is going to have relatively short text, Gutenburg very long.

Very long documents are useful for testing for anomalies, but they're 
not so useful as retrieved documents, nor typical of applications.  Very 
long hits are awkward for users.  Book search engines usually operate 
best either by breaking texts into small units (chapters, pages, 
overlapping windows, etc.) and searching those rather than the entire 
work, perhaps merging multiple hits from the same work in displayed 
results.  (See, e.g., California Digital Library's XTF system, built by 
Kirk Hastings using Lucene. http://www.cdlib.org/inside/projects/xtf/)

I think Wikipedia is a much more typical use of Lucene.

Doug

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message