incubator-lucy-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marvin Humphrey <mar...@rectangular.com>
Subject [lucy-dev] LucyX::Search::DedupingSearcher
Date Tue, 13 Dec 2011 02:54:25 GMT
Greets,

A while back, I wrote a deduping searcher for KinoSearch 0.2x.  A few users
have expressed interest in such a beast, so I've contributed the old code to a
JIRA issue.  When someone wants to work on it, I'll provide help on how to
update it for Lucy.

  https://issues.apache.org/jira/browse/LUCY-198

The API is an IndexSearcher subclass with two extra constructor arguments:
hits_per_unique and dedup_field.  The algorithm, which comes from an old
Lucene module (IIRC, Andrzej Bialecki and Doug Cutting were involved) is to
rerun the search multiple times if necessary, adding filtering to exclude
unwanted results on later iterations.

If anybody wants to work on it or has commentary about the design, speak up.
Otherwise, if there are no objections, expect it to arrive in trunk as
LucyX::Search::DedupingSearcher when somebody finds the tuits to work up a
patch.

Marvin Humphrey


Mime
View raw message