lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From <tomas.zer...@axelspringer.de>
Subject Filtering results based on a set of values for a field
Date Tue, 16 Aug 2011 07:56:51 GMT
Hello, Solrs

we are trying to filter out documents written by (one or more of) the authors from
a mediumish list (~2K). The document set itself is in the millions.

Apart from the obvious approach of building a huge OR-list and appending it
to the query, it seems that writing a Lucene[1] filter (or a SolrFilter[2]) seems
to suggest itself. In fact [3] seems to strongly encourage this approach.

Basically, as we understand it, the filter's method getDocIdSet gets called and is
fed with index segments, "one spoonful at a time". It then decides which docs
of the segment will be accepted, setting the corresponding bits in the result (in
our case, e.g. look up the document's author's name in a HashMap or something
like it).

Our first question is: how does it all fit together? Would be enough to write such a
class? How do I reference that in the SOLR configuration? In the query? A Lucene
Filter or a SolrFilter?

The problem is, we are experiencing very slow response times, in the order of
12 seconds for a query (the OR alternative, which we tested on a smallish author
list of aboug a couple of hundred is nearly-instantaneous).

Our second question is: are we on track with this? Intuition would say, of course,
that sifting sequentially through the index, checking each document for its author
*will* take its time. So may be the approach is doomed? Are there other, better
approaches?

Thanks for any pointers

------

[1] <https://builds.apache.org/job/Lucene-3.x/javadoc/all/org/apache/lucene/search/Filter.html?is-external=true>
[2] <http://lucene.apache.org/solr/api/org/apache/solr/search/SolrFilter.html>
[1] <http://wiki.apache.org/lucene-java/FilteringOptions>

-- tomás
Mime
View raw message