lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson" <erickerick...@gmail.com>
Subject Re: Avoiding ParseExceptions
Date Tue, 06 Jun 2006 21:01:51 GMT
That way madness lies......

I suspect that you'll find that there are a few rules you can apply that
will allow you to "fix" a lot of queries, but... is that really what you
want to do? For instance, a user types

"a and or not b"

Whatever you do, it isn't what the *next* user who types something like that
means.

That said, the exception thrown by the parser gives some information about
what was wrong, but nothing that I'd show a user except perhaps where the
offending token was.

Also, no matter what you fix up, there'll be case + 1 that you didn't think
about.

I suppose the real question is "how many liberties are you willing to take
with the user's query and will the user notice?" Google just wants to return
data, man. Correctness isn't something they really expect users to complain
about. They're better off just simplifying the query until it parses.

If your users expect predictable returns from known data, you can take fewer
liberties than if the users just want *something* back. In the latter case,
ignore my comments and give them something <G>. In the former case, you'll
have to deal with users who submit queries and *don't* get back what they
expect because you "fixed" something in a way they didn't expect. IMO, it's
better to just let these users know the query was mal-formed and perhaps a
hint about why.

If you fix the query, I'd really, really, really recommend that you take a
minimalist approach. Something like:
Chunk the query through the parser
If (parses  OK) return results
Else Strip out everything (e.g. and, or, not, parens, etc) and let the
defaults take over.

It depends on what you want your users to experience and how much time you
want to spend explaining that the program is really behaving as you expect
<G>.....

Of course, this all may be irrelevent depending upon your problem
space.......

Best
Erick

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message