lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Bloem...@peterbloem.nl>
Subject One (large) field shared by many documents
Date Sat, 19 May 2007 15:27:55 GMT
Hi,

I have the following problem. I'm indexing documents that belong to some 
collection (ie. the dataset is divided into collections, which are 
divided into documents). These documents become my lucene documents, 
with some relatively small string that becomes the field I want to 
search. However, I would also like to add to document d the 
concatenation of all documents in d's collection as a field (mainly as a 
smoothing technique, because documents correspond roughly to topics). 
I'm currently doing just that, adding an extra field for the entire 
concatenated collection to each document in that collection. Of course 
this increases the index size and indexing time greatly (about five-fold).

There must be a better way to do this. My idea was to create a second 
index where the collections are indexed as (lucene) documents. This 
index would have the text as a field, and a list of document id's 
referring back to the main index. I could then retrieve the term vector 
for each collection from this second index for each search result from 
the original index.

My question is if this is a smart approach. And if it is, which of 
Lucene's classes should I use for this. The best I could find was the 
FilterIndexReader. If extending the FilterIndexReader is really the best 
way to go, could I simply override the document(int, FieldSelector) 
method, or is there more to it? I doubt I'm the first person that's ever 
wanted a many to one relation between fields and documents, so I hope 
there's a simpler way about this.

Thank you,
 Peter

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message