lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Morus Walter <>
Subject Re: QueryParser refactoring
Date Tue, 08 Mar 2005 14:14:37 GMT
Erik Hatcher writes:
> On Mar 8, 2005, at 4:38 AM, Morus Walter wrote:
> >> I created a modified Query->String converter for my current day time
> >> project (as I use a String representation for the most recently used
> >> drop-down that is stored as a client-side cookie) that explicitly puts
> >> in "OR" between SHOULD BooleanClauses.
> >>
> > You cannot do that in the general case. BooleanQuery allows for queries
> > that cannot be expressed only using AND/OR/NOT.
> > E.g. `a b +c'. That's not `(a OR b) AND c' nor any other boolean 
> > expression.
> > It's also not expressable in QP syntax for default operator AND.
> I just added this bit of code to my QueryConverterTest:
>      BooleanQuery bq = new BooleanQuery();
>      bq.add(new TermQuery(new Term("field", "a")), 
> BooleanClause.Occur.SHOULD);
>      bq.add(new TermQuery(new Term("field", "b")), 
> BooleanClause.Occur.SHOULD);
>      bq.add(new TermQuery(new Term("field", "c")), 
> BooleanClause.Occur.MUST);
>      assertEquals("a OR b +c", QueryConverter.convert(bq, "field"));
>      System.out.println(QueryParser.parse(QueryConverter.convert(bq, 
> "field"), "field", new SimpleAnalyzer()));
> The output is:
> 	a OR b +c
> And the test passes and the query is expressible in QP syntax for 
> AND.... unless I'm missing something obvious here.
Ok, you're right, if you mix boolean queries and flags you can do that. 
I don't like that because it's hard to explain.
And if you understand that as a OR b AND c, what it is looking like, 
you're lost.

> >> Silently drop as in you removed them entirely from the resultant 
> >> Query?
> >>
> > Right. `a AND (NOT b)'  parses to `a'
> Is this what we want to happen for a general purpose next generation 
> Lucene QueryParser though?  I'm not sure.  Perhaps this should be a 
> ParseException instead?
> >> That'd be easy enough to add - but is that what we want to happen?
> >> Community, thoughts?
> >>
> > Throwing an exception is presumably the other alternative.
> > Could that check be done in an overwritable method?
> Good point.  Sure.
That's ok then. Throw a ParseException and whoever doesn't like that,
can overwrite the method and either keep the query (knowing that it will
be ignored in search anyway) or drop it.

> >>> In an application, I handled this by dropping the query and notifying
> >>> the
> >>> user, that some part of the query could not be handled and was 
> >>> ignored.
> >>
> >> How did your application notice that part of the query was dropped?
> >>
> > QueryParser told it ;-)
> > I used a modified query parser that was provided with two StringBuffers
> > and QP filled one with droped stop words (something to report to the 
> > user
> > as well) and the other with droped subqueries.
> Perhaps QueryParser should have some additional hooks allowing you to 
> either subclass and tap into things or pass in some sort of custom 
> listener hook?
> I'm interested in how you trapped the stop words that were removed - 
> did you use a custom analyzer that gave you this information?  Or some 
> other technique?
Right. That isn't obvious. Actually I had to look into my code (written
about a year ago).
I filtered stop words in the query parser itself. That is, the analyzer
didn't use any stop words but the QP had a stop word map and checked 
the analyzer output.
It's more complicated than simply using the same analyzer as indexing did,
but I think it's relevant to users, to know what happend to their query.

> > In that case QP would keep the information in instance variables and 
> > provide
> > getter methods.
> > Having a list of recognized tokens could be helpful as well (e.g. to 
> > create
> > spelling suggestions).
> We're getting vastly more sophisticated than just correcting the order 
> of precedence.  I can continue to fiddle with this over time, but I 
> won't be able to dedicate a solid effort to all of these.  Feel free to 
> take contribute patches that enhance it.  Let me know if my 
> PrecedenceQueryParser needs tweaks to be a usable base or if starting 
> over with QueryParser is better.
Ok, I'll see how much time I can find for this.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message