lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From <>
Subject Filtering results based on a set of values for a field
Date Tue, 16 Aug 2011 07:56:51 GMT
Hello, Solrs

we are trying to filter out documents written by (one or more of) the authors from
a mediumish list (~2K). The document set itself is in the millions.

Apart from the obvious approach of building a huge OR-list and appending it
to the query, it seems that writing a Lucene[1] filter (or a SolrFilter[2]) seems
to suggest itself. In fact [3] seems to strongly encourage this approach.

Basically, as we understand it, the filter's method getDocIdSet gets called and is
fed with index segments, "one spoonful at a time". It then decides which docs
of the segment will be accepted, setting the corresponding bits in the result (in
our case, e.g. look up the document's author's name in a HashMap or something
like it).

Our first question is: how does it all fit together? Would be enough to write such a
class? How do I reference that in the SOLR configuration? In the query? A Lucene
Filter or a SolrFilter?

The problem is, we are experiencing very slow response times, in the order of
12 seconds for a query (the OR alternative, which we tested on a smallish author
list of aboug a couple of hundred is nearly-instantaneous).

Our second question is: are we on track with this? Intuition would say, of course,
that sifting sequentially through the index, checking each document for its author
*will* take its time. So may be the approach is doomed? Are there other, better

Thanks for any pointers


[1] <>
[2] <>
[1] <>

-- tomás
View raw message