incubator-lucy-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nathan Kurz <n...@verse.com>
Subject Re: [lucy-dev] Re: [KinoSearch] Stopwords and AND queries
Date Thu, 16 Dec 2010 20:04:30 GMT
I've only glanced at this, but neither NULL nor a VoidQuery really
seems like the actual solution here.  If a user searches for [foo
test], what they want is a list of documents that contain both.  They
don't want the documents that contain only [test], with no notice that
[foo] is stop listed.

If I search for "The Smiths", it means I'm searching for that term.
If "The" is stop listed, there is simply no way to answer a query that
uses it with anything but an error message.  It seems like there also
needs to be treatment at the QueryParser level to reject or modify
queries that attempt to use stop terms.  More generally, it seems like
Stop Lists themselves should be discouraged as a shortcut from earlier
times when disk storage was at a premium.

Which is to say, I think the current behaviour is correct.  If you
manage to get a query through asking for a stop listed term, the
answer is that it is not there, whether in a phrase or a AND. Courtesy
says that you would return an error message or correct the query, but
this should be handled by the front end and not by the index proper.

--nate

ps.   If you still feel you need to act, I think you need something
like a static StopTerm and to allow the Boolean query classes to
decide how they want to treat this.  But I'd recommend against adding
this complexity unless you're certain it's a real problem that can't
be handled as interface.

On Thu, Dec 16, 2010 at 9:42 AM, Robert Muir <rcmuir@gmail.com> wrote:
> On Wed, Dec 15, 2010 at 3:08 PM, Marvin Humphrey <marvin@rectangular.com> wrote:
>> On Wed, Dec 15, 2010 at 04:28:54PM +0100, Nick Wellnhofer wrote:
>>> I only had a cursory glance at the code and it seems that returning NULL
>>> is the easiest approach though it looks a bit hackish. Introducing a new
>>> VoidQuery class is probably the cleanest solution but I guess it
>>> requires a lot more additional code.
>
> FWIW (definitely not trying to imply its the best!), NULL is what
> lucene-java does if the Analyzer returns zero tokens.
> This means it has to be careful too in all the other processing, for
> example the code that applies boost has to handle the query == null
> case
>

Mime
View raw message