lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <>
Subject Re: Best Practices for getting Strings from a position range
Date Wed, 15 Aug 2007 15:10:06 GMT

On Aug 15, 2007, at 10:46 AM, Peter Keegan wrote:

> Grant,
> I built an index as described here:
> Many documents have only 1 or 2 rows, some have dozens.
> Here is a typical query without spans:
> +((+contents:quaker +contents:cereal) (+boost50:quaker  
> +boost50:cereal))
> +literals:co$us), sort=<custom:"feedbabe":
> RoundingScoreDocComparator@8c169d05>,"dateactiveR"!
> Here is a typical query with spans:
> +spanNear([adliterals:jb$1, adliterals:co$us], 8, false)
> +(+((+contents:quaker +contents:cereal) (+boost50:quaker  
> +boost50:cereal))
> +literals:co$us), sort=<custom:"feedbabe":
> RoundingScoreDocComparator@8c169d05>,"dateactiveR"!
> The addition of the spanNear clause caused the 10X decrease in  
> throughput. I
> could probably change the way rows are indexed and use ordered  
> terms, which
> seems to be a bit faster (only 5X decrease)

In looking at the code, it makes sense that an ordered SpanNearQuery  
would be faster.

I am still trying to dig into the logistics of the Unordered  
SpanNearQuery, as it is the only thing hanging me up on adding  
payload access to Spans.  I need to step through and debug.  As your  
stack trace showed, there is a lot of work taking place to manage the  
priority queue that is created.  I just don't understand the relation  
between the SpanCells, the "ordered" List and the PriorityQueue  
"queue" just yet.  It seems the SpanCells make a linked list, the  
"ordered" list is for getting the spans from the sub queries and the  
queue seems to rearrange the ordered list

If anyone wants to chip in with pseudocode explaining what is going  
on in it would be helpful. 

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message