lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Smith <>
Subject Actual (specific) RT Search?
Date Thu, 17 Mar 2016 08:27:56 GMT

The purpose of the project is an actual RT Search, not NRT, but with a
specific condition: when an updated document meets a fixed criteria, it
should be filtered out from future results (no reuse of the document).
This criteria is present in the search query but of course doesn't work
for uncommitted documents.

What I wrote is a combination of the following:
- an UpdateRequestProcessor in the update chain storing the document
unique key in a local cache when the condition is met
- a postCommit listener clearing the cache
- a PostFilter collecting documents that aren't found in the cache,
activated in the search query as a fq parameter

Functionally it does the job, however for large indexes the filter takes
a hit. The index that poses problem has 18 mil documents in 13Gb, and
queries return an average of 25,000 docs in results. The VM has 8 cores
and 20Gb RAM, and uses nimble storage (combination of ssd & hd). Without
the code Solr works like a charm. My guess so far is that the filter has
to fetch the unique key for all documents in results, which consumes a
lot of resources.

What would be your advice?
- Could I use the internal document id instead of a field value? This id
would have to be available both in the UpdateRequestProcessor and
PostFilter: is it the case and how can I access it? I suppose the
SolrInputDocument in the request processor doesn't have it yet anyway.
- If I reduce the autoSoftCommit maxDocs value (how far?), would it be
wise (and feasible) to convert the PostFilter into a plain filter query
such as "*:* NOT (id:1 OR id:2)" or something similar? How could I
implement this and how to estimate the filter cost in order for Solr to
execute it at the right position?
- Maybe I took the wrong path altogether?

Thanks in advance

View raw message