lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mark Miller" <markrmil...@gmail.com>
Subject Re: Proximity Query Parser
Date Fri, 01 Sep 2006 17:46:28 GMT
Eric also gave me the idea of using a SpanNear with maximum slop as a
boolean to connect spans. Using this and SpanOr seems to make my time spent
on the distribution of proximity clauses a little foolish :) Is that true?
Is there any disadvantage to the max slop Spannear, SpanOr solution? Any
advantage to distributing the 'and's?

- Mark


On 9/1/06, Mark Miller <markrmiller@gmail.com> wrote:
>
>  Thanks for the tip Paul. It is embarrassing, but I only realized how
> OrSpan queries worked a day or two ago based on a tip from Eric. The way I
> assumed it would create the spans before was just wrong and I never had
> researched further. Now I see that it would be a nice optimization for what
> I have...but I have not yet looked into how easy it will be to integrate it
> into my distribution algorithm. I do use it for multiphrase queries however
> based on Erics tip. It will hopefully be pretty simple to apply it to my
> distribution, but I have not had time to check it out. I plowed this thing
> out pretty quickly and am hoping I can go back and clean up a lot of things.
> Need a short break though to pump out some other things. As I learn more
> about Lucene and JavaCC I will incorporate new methods into the parser.
>
>
> - Mark
>
>
>  On 9/1/06, Paul Elschot <paul.elschot@xs4all.nl> wrote:
> >
> > On Friday 01 September 2006 12:54, Mark Miller wrote:
> >
> > > Hi Paul,
> > >
> > > I also have to treat things differently depending on if I am in a
> > > proximity clause or boolean clause. A wildcard in a boolean is mapped
> > to
> > > a wildcard query. A wildcard in a proximity is mapped to a regex span
> > > that has been modified to only deal with * and ?. When I run into a
> > > proximity, I collect a small tree of each clause and distribute them
> > > against each other...(old | map) ~3 big gets distributed to old ~3 big
> > |
> > > map ~3 big. This distribution method appears to handle all
> >
> > There is no need to repeat "big". SpanQueries can be nested,
> > so when mapping like this:
> > SpanNear(SpanOr( old, map), big)
> > the query structure will only grow for truncations and fuzzy stuff.
> >
> > > boolean/proximity nesting/mixing cases for me, including: great ! "big
> > > old phrase search" ~5 (holy ~4 (big black bear)). The distribution
> > > maintains order of operations, but also obviously can create some
> > pretty
> > > large queries.
> >
> > Regards,
> > Paul Elschot
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message