lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Muir <rcm...@gmail.com>
Subject Re: [jira] Commented: (LUCENE-2458) queryparser shouldn't generate phrasequeries based on term count
Date Sun, 23 May 2010 16:46:56 GMT
On Sun, May 23, 2010 at 12:34 PM, Shai Erera <serera@gmail.com> wrote:

> I want to stress that not all ngram-based languages are affected by this
> behavior, especially those for which we do ngram just because of a lack of
> good tokenizer.
>

They are also affected! Do you understand how the queryparser treats
whitespace? You cannot currently use "normal" word spanning n-grams
with lucene because of this:

1) you can only use word-internal n-grams because each
whitespace-separated word gets its own tokenstream
2) all queries here are also made into phrasequeries automatically,
which is stupid as n-grams already contain the 'positional
information'

-- 
Robert Muir
rcmuir@gmail.com

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message