lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <>
Subject Re: SpanNearQuery's spans & payloads
Date Sat, 12 Sep 2009 11:42:52 GMT

On Sep 12, 2009, at 5:12 AM, Michael McCandless wrote:

> OK thanks for the responses.  This is indeed tricky stuff!
> On Sat, Sep 12, 2009 at 12:28 AM, Mark Miller  
> <> wrote:
>> They start at the left and march right - each Span always starting
>> after the last started,
> That's not quite always true -- eg I got span 1-8, twice, once I added
> "b" as a clause to the SNQ.
>> You might want exhaustive for highlighting as well - but its
>> different algorithms ...
> Yeah, how we would represent spans for highlighting is tricky... we
> had discussed this ("how to represent spans for aggregate queries")
> recently, I think under LUCENE-1522.
> I think we'd have to return a tree structure, that mirrors the query's
> tree structure, to hold the spans, rather than try to enumerate
> ("denormalize") all possible expansions.  Each leaf node would hold
> actual data (position, term, payload, etc.), and then the tree nodes
> would express how they are and/ord/near'd together.  My app could then
> walk the tree to compute any combination I wanted.
>> In the end, I accepted my definition of works as - when I ask for
>> the payloads back, will I end up with a bag of all the payloads that
>> the Spans touched. I think you do.
> Yeah I think you do, except each payload is only returned once.  So
> it's only the first span that hits a payload that will return it.
> So it sounds like SNQ just isn't guaranteed to be exhaustive in how it
> enumerates the spans, eg I'll never see that 2nd occurrence of "k",
> nor its associated payload.

I believe this is my understanding as well.  If Doug and Paul chime  
in, maybe we will know better.

That being said, I think it is reasonable to want to have an  
exhaustive list of matches, even when they overlap.  We simply could  
create a new SpanNear that does this.

> For now I'll just match this behavior ("can only load payload once")
> in all codecs in LUCENE-1458... the test passes again once I do that.
>> I meant, all those Spans came from one query - so you got your bag
>> of payloads right? If each Span was a separate entity, it would
>> obviously be way wrong - but from a single SpanQuery, at least you
>> got all the payloads in some form :)
> Right, this is all one query... but the payload for the 2nd
> occurrence of "k" was never included in any span so I didn't get "all"
> payloads.
> Maybe if/once we incorporate spans into Lucene's normal queries
> (optionally, so there's no performance hit if you don't ask for them)
> we can re-visit these issues.

Good luck with that!  :-)  The SpanQuery themselves ask for them as it  
is now.  The bigger bugaboo to fix, I think, is the use case I laid  
out a bit ago where it is a real pain to coalesce both the results of  
running the query with effectively accessing the Spans and not having  
to constantly reset/skipTo.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message