lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Elschot <paul.elsc...@xs4all.nl>
Subject Re: SpanNearQuery's spans & payloads
Date Sat, 12 Sep 2009 16:07:53 GMT
On Saturday 12 September 2009 14:40:28 Mark Miller wrote:
> Michael McCandless wrote:
> > OK thanks for the responses.  This is indeed tricky stuff!
> >
> > On Sat, Sep 12, 2009 at 12:28 AM, Mark Miller <markrmiller@gmail.com> wrote:
> >
> >   
> >> They start at the left and march right - each Span always starting
> >> after the last started,
> >>     
> >
> > That's not quite always true -- eg I got span 1-8, twice, once I added
> > "b" as a clause to the SNQ.
> >   
> Mmm - right - depends on how you look at it I think - it is less simple
> with terms at multiple positions, in that now each Span doesn't start
> in the *position* after the last - but if you line up the terms like you
> did, its still the same - the first 1 - 8 starts at the first term at
> pos 1, and
> the next 1 to 8 starts at the seconds term at pos 1. One starts after
> the other (though if you think Lucene positions, I realize they virtually
> start at the same spot).
> >   
> >> You might want exhaustive for highlighting as well - but its
> >> different algorithms ...
> >>     
> >
> > Yeah, how we would represent spans for highlighting is tricky... we
> > had discussed this ("how to represent spans for aggregate queries")
> > recently, I think under LUCENE-1522.
> >
> > I think we'd have to return a tree structure, that mirrors the query's
> > tree structure, to hold the spans, rather than try to enumerate
> > ("denormalize") all possible expansions.  Each leaf node would hold
> > actual data (position, term, payload, etc.), and then the tree nodes
> > would express how they are and/ord/near'd together.  My app could then
> > walk the tree to compute any combination I wanted.
> >
> >   
> >> In the end, I accepted my definition of works as - when I ask for
> >> the payloads back, will I end up with a bag of all the payloads that
> >> the Spans touched. I think you do.
> >>     
> >
> > Yeah I think you do, except each payload is only returned once.  So
> > it's only the first span that hits a payload that will return it.
> >
> > So it sounds like SNQ just isn't guaranteed to be exhaustive in how it
> > enumerates the spans, eg I'll never see that 2nd occurrence of "k",
> > nor its associated payload.
> >   
> Not only not guaranteed, but its just not going to happen - its not
> how spans match. If I say find n within 300 of m with the following:
> 
> n m m m m m m m m m m m m  m m m m m m m m m m m m m m m m m m m m m m
> m  m m m m m m m m m m m
> 
> Only the first m will match. It will start at the left, find the n, then
> say great, an m within 300, this doc matches, we are done. There is
> not another n to start on or finish on to the right. It doesn't then
> touch the next 300 m's - just they way Doug implemented them from what I
> can tell. Its only exhaustive from the
> left - find m within 300 of n, order matters (m first)
> 
> m m m m m m m m m m m m m m m m m m n
> 
> This will be a bunch of spans - start at the left - the first m to n
> matches, then the second m - n matches, then the third m to n matches,
> and so on as we move right.

In the ordered case that last one should only match once, against
the last m.

Regards,
Paul Elschot

> > For now I'll just match this behavior ("can only load payload once")
> > in all codecs in LUCENE-1458... the test passes again once I do that.
> >
> >   
> >> I meant, all those Spans came from one query - so you got your bag
> >> of payloads right? If each Span was a separate entity, it would
> >> obviously be way wrong - but from a single SpanQuery, at least you
> >> got all the payloads in some form :)
> >>     
> >
> > Right, this is all one query... but the payload for the 2nd
> > occurrence of "k" was never included in any span so I didn't get "all"
> > payloads.
> >   
> You got all the payloads the query matched - I think you need a
> different query (or
> we change the Spans algorithm completely)
> > Maybe if/once we incorporate spans into Lucene's normal queries
> > (optionally, so there's no performance hit if you don't ask for them)
> > we can re-visit these issues.
> >
> > Mike
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-dev-help@lucene.apache.org
> >
> >   
> 
> 
> -- 
> - Mark
> 
> http://www.lucidimagination.com
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
> 
> 


Mime
View raw message