lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sirish Vadala <sirishre...@gmail.com>
Subject Re: Problem using TopFieldCollector
Date Mon, 14 Jun 2010 17:20:29 GMT

Thanks for the response.

Yeah, eventually I choose to extend the Collector method, since none of the
other collectors viz. TopFieldCollector, TopDocsCollector does allow me to
extend them and override.

I could not grasp what exactly the below means:


Rebecca Watson wrote:
> 
> i keep a copy of the current indexreader + docBase, and used the
> indexreader.document
> method to get the doc/field values required in the collect method.
> 
> note that the docBase is used to keep/get the global docid but the doc
> passed
> to the .collect method relates to the current indexreader. i.e.
> global-docid = docBase + docid
> 

The documentation says:

NOTE: The doc that is passed to the collect method is relative to the
current reader. If your collector needs to resolve this to the docID space
of the Multi*Reader, you must re-base it by recording the docBase from the
most recent setNextReader call.

In my current collect method implementation, I use searcher.doc(doc) as
shown below:

	public void collect(int doc) throws IOException {
		try {
                        ... ... ... ... ...
			Document document = searcher.doc(doc);
                        ... ... ... ... ...
		} catch (CorruptIndexException e) {
			System.err.println("ERROR: " + e.getMessage());
		} catch (IOException e) {
			System.err.println("ERROR: " + e.getMessage());
		}
	}

Does this mean I am getting a wrong document, since I am not adding the
docBase?

I will do some research and use a hitqueue to sort the records in collector.

Also I haven't done any profiling to see the difference in the cost due to
the filed-loads. I would profile the test case and check the difference in
cost. But in most likely case, I need to filter the records as per user
requirement, that eventually leads to field load.

One more discovery though :)
Now I cannot use collector.topDocs().scoreDocs; if I extend collector. Looks
like I have to save the records in another variable in the collector and
retrieve.

Changing the entire implementation and specification makes it difficult to
migrate.

Thanks anyway.


Rebecca Watson wrote:
> 
> you sort in the collector anyway? -- using a custom hitqueue? in which
> case you'd
> use the global docid in the hitqueue / any filters created through the
> collector.
> 
> also, we ended up having to re-engineer our system so that we didn't
> use field-loads
> as this was a bottleneck in our system... maybe you should think about
> merging
> the two advanced/text documents together even if this duplicates
> information in your
> index so you don't have to do field loads...
> not sure if you've profiled to see the difference in cost?
> 
> hope that helps,
> 
> bec :)
> 

-- 
View this message in context: http://lucene.472066.n3.nabble.com/Problem-using-TopFieldCollector-tp889310p895092.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message