lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Timm, Andy (ETW)" <Andy.T...@nike.com>
Subject removing duplicate Documents from Hits
Date Fri, 01 Oct 2004 19:00:49 GMT
Hello, I've searched on previous posts on this topic but couldn't find an answer.  I want to
query my index (which are a number of 'flattened' Oracle tables) for some criteria, then return
Hits such that there are no Documents that duplicate a particular field.  In the case where
table A has a one-to-many relationship to table B, I get one Document for each (A1-B1, A1-B2,
A1-B3...).  My index needs to have each of these records as 'B' is a searchable field in the
index.  However, after the query is executed, I want my resulting Hits on be unique on 'A'.
 I'm only returning the Oracle object ID, so once I've seen it once I don't need it again.
 It looks like some sort of custom Filter is in order.  My fix at the moment is to run the
query, then store unique id's in a Map to build another query that will return singletons
on field 'A'.  I could skip this step if there was a way to remove documents from Hits (I
didn't see a way).  Has anyone written a filter that does this?  Are there others using Lucene
to mimic a relational DB?  I've got a complex SQL search that joins (most outer) 40 some tables.
 Query performance is important, and the tables are relatively static.  I find the ID's of
the objects that match the users' criteria, then go to the DB to instantiate them.  Any comments
are appreciated.  


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message