lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Elschot <paul.elsc...@xs4all.nl>
Subject Re: SpanQuery and Spans optimizations
Date Thu, 06 Aug 2009 18:31:39 GMT
With a single search one might end up collecting lots of span info
that will be thrown away because the document score is too low.

So I think the best way is to first collect the best hits in the usual
way, and then get the spans of the query (effectively once more,
but now without SpanScorer in between) with the doc numbers
of the best hits as a filter while collecting all the begin/end positions.

This second phase that applies a Filter to a Spans has been
requested before so it might be a useful addition.

Regards,
Paul Elschot


On Thursday 06 August 2009 20:04:11 Grant Ingersoll wrote:
> seek() seems somewhat doable, although inefficient because the  
> underlying TermPositions supports seek, but that really would only  
> allow us to go back to the beginning, I think (besides the fact that  
> Spans is an interface and it would break back compat, ugh!).   
> Collector route seems more promising and since that API isn't fixed  
> yet, might be more doable.  It could either be done right on Collector  
> or could introduce something like SpanCollector, but then that would  
> imply re-implmenting many of the existing collectors.  Not sure what's  
> involved just yet.
> 
> -Grant
> 
> 
> On Aug 6, 2009, at 1:50 PM, Grant Ingersoll wrote:
> 
> > I think it is fairly common use case (relative to the rather  
> > uncommon use case of using SpanQuery that is) to want to do  
> > something like:
> >
> > ...
> > SpanQuery sq = ...
> > topDocs = searcher.search(tq, 10);
> > Spans spans = sq.getSpans(searcher.getIndexReader());
> >
> > for (int i = 0; i < topDocs.scoreDocs.length; i++) {
> > 	spans.seek(topDocs.scoreDocs[i]);  //NOTE: seek() does not exist as  
> > a method, only skipTo, and skipTo() can only go forward, so this  
> > CODE DOESN'T WORK!!!!!!
> > 	//Do something with the info at that span
> > }
> >
> > Yet, this really isn't possible because Spans.skipTo() only moves  
> > forward.    So, you are left trying to marry running the search with  
> > moving around in the Spans, or some other rather clunky mechanism  
> > and this code is almost always really ugly.  Alternatively, people  
> > forgo the search() part and just go straight to the spans, but then  
> > you miss out on scores.
> >
> > It just has never felt right to me, but I am not seeing a better way  
> > of doing it at the moment, so I thought I would throw it out to the  
> > list to see what people think.  That is, how can we generate a Spans  
> > object that is backed by the order in a ScoreDocs array?  The thing  
> > is, in order to run the SpanQuery, we iterated over the Spans  
> > anyway?  I think that what I would really like is for the case where  
> > I am doing SpanQuerys that I can tell it to preserve the Span by  
> > hanging it off of something (maybe the Collector could have a  
> > callback that allows me to collect Span info).  (not sure if that  
> > makes sense).  I realize this would be extra memory, but that is  
> > probably a cost I'm willing to pay.  Alternatively, we need to add a  
> > seek() method to spans() and pay the cost of thrashing.
> >
> > Thoughts?  Am I off base here or missing something?
> >
> > -Grant
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-dev-help@lucene.apache.org
> >
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
> 
> 
> 


Mime
View raw message