lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Muir (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SOLR-3145) Velocity /browse GUI should stick to AND as defaultOperator
Date Tue, 13 Mar 2012 17:40:38 GMT

    [ https://issues.apache.org/jira/browse/SOLR-3145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13228533#comment-13228533
] 

Robert Muir commented on SOLR-3145:
-----------------------------------

{quote}
But Jan is talking about just changing the default for just an example GUI (/browse), and
not any query parsers. 
{quote}

I think its pretty important. The problem is that in some languages, someone enters a search
query with some useless particle
or something and misses documents completely only because of grammatical structure.

Also for a lot of languages (e.g. chinese), tokenization into 'query terms' is not even close
to completely accurate!

{quote}
That's pretty minor - not a big deal either way, but I do think that from a "finished product"
perspective, more people expect all of their query terms to appear in matching documents (and
I believe this is how google does it?
{quote}

This is false. Search for 'lucid in imagination' and look for the first result, it does not
contain the word 'in'. 
This is just an illustration of my point (its hard to come up with examples for english),
but other examples
would be simple things like searching for U.S.A-China relations and missing documents that
have U.S.-China relations.

In general most of the stopwords lists we have are very incomplete and minimal: I think this
is good. But if you choose
to use AND as a default, you need to be much more aggressive about these things.

Also i'm completely failing to mention use cases that do more natural language searches (e.g.
longer queries) would really
suffer more here. 

Again I think: don't wire the queryparser to force 100% query-term-importance, lean on the
ranking system to do this.
As i mentioned, its my opinion there are serious problems with lucene's sqrt() tf normalization
(it grows too fast and does
not represent the information gain of additional term occurrences well), causing additional
occurences of only a few terms
to blow up the score versus documents that actually do contain all terms: but we shouldn't
solve that with a hammer like this.

So from a 'finished product' I think it should work reasonably well for as many languages
and use cases as possible out of box:
it should be generic. This kind of tuning thats specific to only certain use cases/languages/configurations
is well documented 
(its easy to change the default operator) and not tricky to do.

                
> Velocity /browse GUI should stick to AND as defaultOperator
> -----------------------------------------------------------
>
>                 Key: SOLR-3145
>                 URL: https://issues.apache.org/jira/browse/SOLR-3145
>             Project: Solr
>          Issue Type: Improvement
>          Components: web gui
>    Affects Versions: 4.0
>            Reporter: Jan Høydahl
>            Assignee: Jan Høydahl
>             Fix For: 4.0
>
>         Attachments: SOLR-3145.patch
>
>
> After SOLR-1889 was committed, the DisMax "mm" parameter defaults to whatever set in
q.op. Since defaultOperator in schema.xml is OR, this means that DisMax now defaults to OR
(mm=0) instead of the old default (mm=100%). It should stick to AND as before.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message