On Aug 17, 2005, at 3:29 PM, Dan Funk wrote:
> Currently I'm working with a single index where content is indexed
> by it's original printed page. I have to show the total number of
> matching documents, so I end up running through all the hits and
> taking an order of magnitude hit on performance as I calculate the
> number of unique documents. It's stupid for many many reasons.
>
> To correct all this, I've decided to create two (maybe three)
> indexes for the same set of documents: in the first index there is
> a one to one relationship between the original document and the
> Lucene Document object. The other index is a paragraph index,
> where each lucene document represents a single paragraph. I may
> even throw in a third index where each lucene document represents a
> logical section/chapter.
>
> When I'm building the search results page I'll have to execute a
> fair number of queries. The first query will execute on the
> Document-Index, then for each of the 10 to 2o results I'm
> displaying at the time, I'll execute another query to find the best
> paragraph and or section.
>
> Is this a reasonable solution to the problem?
> Thanks for the advice.
Just one design alternative - a Lucene index does not have to be
homogenous in terms of the fields for a document. So you could index
all those various granularities into a single index with an
additional field per document indicating whether it is a document,
paragraph, or section/chapter.
Erik
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
|