lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <>
Subject Re: SpanNearQuery's spans & payloads
Date Sat, 12 Sep 2009 09:12:19 GMT
OK thanks for the responses.  This is indeed tricky stuff!

On Sat, Sep 12, 2009 at 12:28 AM, Mark Miller <> wrote:

> They start at the left and march right - each Span always starting
> after the last started,

That's not quite always true -- eg I got span 1-8, twice, once I added
"b" as a clause to the SNQ.

> You might want exhaustive for highlighting as well - but its
> different algorithms ...

Yeah, how we would represent spans for highlighting is tricky... we
had discussed this ("how to represent spans for aggregate queries")
recently, I think under LUCENE-1522.

I think we'd have to return a tree structure, that mirrors the query's
tree structure, to hold the spans, rather than try to enumerate
("denormalize") all possible expansions.  Each leaf node would hold
actual data (position, term, payload, etc.), and then the tree nodes
would express how they are and/ord/near'd together.  My app could then
walk the tree to compute any combination I wanted.

> In the end, I accepted my definition of works as - when I ask for
> the payloads back, will I end up with a bag of all the payloads that
> the Spans touched. I think you do.

Yeah I think you do, except each payload is only returned once.  So
it's only the first span that hits a payload that will return it.

So it sounds like SNQ just isn't guaranteed to be exhaustive in how it
enumerates the spans, eg I'll never see that 2nd occurrence of "k",
nor its associated payload.

For now I'll just match this behavior ("can only load payload once")
in all codecs in LUCENE-1458... the test passes again once I do that.

> I meant, all those Spans came from one query - so you got your bag
> of payloads right? If each Span was a separate entity, it would
> obviously be way wrong - but from a single SpanQuery, at least you
> got all the payloads in some form :)

Right, this is all one query... but the payload for the 2nd
occurrence of "k" was never included in any span so I didn't get "all"

Maybe if/once we incorporate spans into Lucene's normal queries
(optionally, so there's no performance hit if you don't ask for them)
we can re-visit these issues.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message