lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Elschot <paul.elsc...@xs4all.nl>
Subject Re: Precedence parser: NOT/AND, disableCoord
Date Tue, 15 Mar 2005 20:57:17 GMT
On Tuesday 15 March 2005 01:55, Erik Hatcher wrote:
> 
> On Mar 13, 2005, at 2:35 AM, Paul Elschot wrote:
> > I had a short look through the new precedence parser
> > and noticed a possible issue.
> >
> > Adding this in the TestPrecedenceParser testSimple() method:
> >
> >     assertQueryEquals("NOT a AND b", null, "-a +b");
> >    // currently parses as -(+a +b)
> >
> > fails the test because it parses as NOT (a AND b).
> >
> > This might be improved by using the modifier (+, - , NOT) at the
> > clause level instead of at the andExpression level.
> 
> Paul - I attempted to do this, but my efforts were not successful, and 
> actually broke more tests than it corrected, so I reverted my local 
> changes.
> 
> I'd welcome others to give it a try, though.  I'm still learning how to 
> accomplish things with JavaCC.

The basic rule is the deeper the nesting of the grammar construct,
the higher the parsing precedence of the corresponding operator.

> > About a year ago I tried to come up with a good syntax for
> > mixing AND/OR/NOT and +/- and brackets in a consistent
> > way, and I gave up, dropping the + and -.
> 
> I've struggled (and still struggling) with this same thing myself.  
> AND/OR are problematic because of the nature of boolean clauses, which 
> individually have their own required/prohibited flags, not in 
> _conjunction_ with another clause.

I'd recommend to use getOrQuery() and getAndQuery() methods to
map to a boolean query via the clauses, and to implement corresponding
grammar constructs that just collect the clauses. Top level for OR,
next level for AND, next level for NOT.
Mixing - and NOT is no problem, but + and AND is not nice.
Allowing the mix should probably throw a ParseException in
some unexpected corners. It might for example be possible to
disallow + on the level of AND.

The traditional boolean query with + and - is probably best
handled at the same level as OR. In some cases, eg:
(aa AND bb) OR cc NOT dd
the resulting query could be mapped back to a traditional one:
+aa +bb cc -dd
Would this mapping back be needed?
(This mapping back is the reverse of what I implemented
in BooleanScorer2 which maps from the traditional form to
scorers corresponding to AND, OR and NOT)

How much compatibility with the existing parser is needed,
in particular with the coordination factor?
 
> > However, from what I see now in the precedence parser,
> > giving up might have been premature. It seems to be possible
> > to make the mix after all.
> 
> I believe its possible, but I won't be spending time on it in the next 
> few weeks at least - anyone interested in the next generation query 
> parser is encouraged to pick up the torch.

I suppose another goal is to get the span queries in the parser.
Once the boolean OR/AND precedence works, it is also possible
to add operators with even higher precedence for the span queries.
Would you, or someone else, have any ideas on what the syntax for these
should look like? How much of the span functionality should be made
available via the query language?

Regards,
Paul Elschot



---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message