lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Muir <>
Subject Re: [jira] Commented: (LUCENE-2458) queryparser shouldn't generate phrasequeries based on term count
Date Sun, 23 May 2010 16:46:56 GMT
On Sun, May 23, 2010 at 12:34 PM, Shai Erera <> wrote:

> I want to stress that not all ngram-based languages are affected by this
> behavior, especially those for which we do ngram just because of a lack of
> good tokenizer.

They are also affected! Do you understand how the queryparser treats
whitespace? You cannot currently use "normal" word spanning n-grams
with lucene because of this:

1) you can only use word-internal n-grams because each
whitespace-separated word gets its own tokenstream
2) all queries here are also made into phrasequeries automatically,
which is stupid as n-grams already contain the 'positional

Robert Muir

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message