lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson" <>
Subject Re: One (large) field shared by many documents
Date Sat, 19 May 2007 21:49:30 GMT
This seems kind of kludgy, but that may just mean I don't understand
your problem very well.

What is it that you're trying to accomplish? Searching constrained
by topic or groups?

If you're trying to search by groups, search the archive for the
word "facet" or "faceted search".

Otherwise, could you describe what behavior you're after and maybe
there'd be more ideas....


On 5/19/07, Peter Bloem <> wrote:
> Hi,
> I have the following problem. I'm indexing documents that belong to some
> collection (ie. the dataset is divided into collections, which are
> divided into documents). These documents become my lucene documents,
> with some relatively small string that becomes the field I want to
> search. However, I would also like to add to document d the
> concatenation of all documents in d's collection as a field (mainly as a
> smoothing technique, because documents correspond roughly to topics).
> I'm currently doing just that, adding an extra field for the entire
> concatenated collection to each document in that collection. Of course
> this increases the index size and indexing time greatly (about five-fold).
> There must be a better way to do this. My idea was to create a second
> index where the collections are indexed as (lucene) documents. This
> index would have the text as a field, and a list of document id's
> referring back to the main index. I could then retrieve the term vector
> for each collection from this second index for each search result from
> the original index.
> My question is if this is a smart approach. And if it is, which of
> Lucene's classes should I use for this. The best I could find was the
> FilterIndexReader. If extending the FilterIndexReader is really the best
> way to go, could I simply override the document(int, FieldSelector)
> method, or is there more to it? I doubt I'm the first person that's ever
> wanted a many to one relation between fields and documents, so I hope
> there's a simpler way about this.
> Thank you,
> Peter
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message