lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler" <...@thetaphi.de>
Subject RE: Recommendation for doing a search plus collecting extra information?
Date Mon, 12 Oct 2015 06:28:14 GMT
Hi,

it may sound a bit stupid, but you can do the following:

If you search for a docvalues (previously fieldcache) field in lucene, the returned TopFieldDocs
contains also the field values that were sorted against. The ScoreDoc instances in this collection
are actually FieldDoc instances (cast them down): https://lucene.apache.org/core/5_3_1/core/org/apache/lucene/search/FieldDoc.html

So my suggestion would be: sort primarily against score (SortField.SCORE), but add a secondary
sort field with the docvalues field you want to be part of your results. The results will
be primarily sorted against the score so you should still get the results in right order,
but you can have the docvalues field as part of your TopFieldDocs (https://lucene.apache.org/core/5_3_1/core/org/apache/lucene/search/TopFieldDocs.html)
collections after downcasting the ScoreDoc to Fieldoc (the sorted fields are saved as Object[]
instances). Choose the second FieldDoc field and cast it to your data type.

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de

> -----Original Message-----
> From: Trejkaz [mailto:trejkaz@trypticon.org]
> Sent: Monday, October 12, 2015 2:25 AM
> To: Lucene Users Mailing List
> Subject: Re: Recommendation for doing a search plus collecting extra
> information?
> 
> On Mon, Oct 12, 2015 at 6:32 AM, Alan Woodward <alan@flax.co.uk> wrote:
> > Hi Trejkaz,
> >
> > You can still use a standard collector if you don’t need to worry
> > about multi-threaded search.  It sounds as though what you want to do
> > is implement your own Collector that will read and record docvalues hits,
> and use MultiCollector to wrap it and a standard TopDocsCollector together.
> 
> I guess the benefit of doing it directly at the Collector is that the results will
> come in doc ID order, so any I/O I'm doing would be local to the previous I/O?
> Which makes sense, and fetching the values seems easy enough, but then
> the order I get the results is not the order they will come back in the search,
> so I have to find a fairly efficient way to map int->int so that I can look them
> up later.
> 
> What would seem ideal here is extending ScoreDoc to put my new int in that,
> so that it's stored along with the same object that gets sorted and ultimately
> ends up in the array (plus the extra storage requirement would be as low as
> possible), but there the ScoreDoc is created by
> HitQueue#getSentinelObject() and there is no way to get a different subclass
> of HitQueue in TopScoreDocCollector. So I think this route would require
> reimplementing pretty much all of TopScoreDocCollector. I guess it isn't very
> large, but I worry about future API changes when messing with internal stuff.
> 
> TX
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message