lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler" <>
Subject RE: [jira] Commented: (LUCENE-2458) queryparser shouldn't generate phrasequeries based on term count
Date Sun, 23 May 2010 18:39:10 GMT
> On Sun, May 23, 2010 at 1:00 PM, Uwe Schindler <> wrote:
> >  I just want to make the feature accessible and documented without
> Version.
> I think it is just a bug (a shoddy implementation that does not use the syntax,
> whether it was quoted or not, since this has been thrown away). In this
> implementation no one thought about languages that don't use whitespace
> and that it would make all queries into phrasequeries.

I am happy with your API changes that have the additional param "quoted", but based on that
we should also have the Boolean switch to preserve the old behavior with non-CJK-or-other-non-whitespace
changes. We could also add a method to TokenStream/Analyzer that is "isTokenizingOnWhitespace"
(don’t read this seriously!).

> I really do not think this sort of code belongs inside core lucene, if you want
> to make uninternationalized code in your own code base that is not correct
> that is fine.

Me affects the issue not at all, I (and also Shai) use own query parsers that do exactly that
(and my code is using mixed language, preferabely English-only with some foreign fragments)
and no CJK and other stuff. So here you are correct, I only see thousands of issues and bug
reports. Because in western European world, uses build phrases the way I said (and that is
what was behind the code you don’t like).

> Furthermore by preserving this kind of bug it makes the queryparser more
> complicated, and especially in the future. If at some point in the future you
> want to really have the QP not split on whitespace (as you yourself said on
> the issue you want) to enable support for multi-word synonyms and "real" n-
> grams at querytime, I hope you understand this buggy code conflicts and
> complicates this later goal.

In 3.1 we dont remove QP from core, so lets fix it there. In 4.0 we can possibly have a totally
new QP with no backwards at all, so no problem. As Earwin noted in another reply: His comment
is the way QP should work, but this is hard to do with analyzers in front (I would have some
ideas). But then we must do it without crappy javacc or jflex and with Analyzer only + some
self-coded stuff only.

Let's drink some beer and think about it (too bad that I am out of Czech now). Possibly at
Berlin! :-)


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message