lucene-solr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hoss Man (JIRA)" <j...@apache.org>
Subject [jira] Commented: (SOLR-1553) extended dismax query parser
Date Mon, 07 Dec 2009 18:09:18 GMT

    [ https://issues.apache.org/jira/browse/SOLR-1553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12787021#action_12787021
] 

Hoss Man commented on SOLR-1553:
--------------------------------

Thoughts while reading the code...

* the code is kind of hard to read ... there's a serious dirth of comments
* reads very kludgy, clearly a hacked up version of DisMax ... probably want to refactor some
helper functions (that can then be documented) 
* the clause.field and getFieldName functionality is dangerous for people migrating from edismax->dismax
(users guessing field names can query on fields the solr admin doesn't want them to query
on) ... we need an option to turn that off.
** one really nice thing about the field query support though: it looks like it would really
be easy to add support for arbitrary field name aliasing with something like f.someFieldAlias.qf=realFieldA^3+realFieldB^4

** perhaps getFieldName should only work for fields explicitly enumerated in a param?
* why is "TO" listed as an operator when building up the phrase boost fields? (line 296) ...
if range queries are supported, then shouldn't the upper/lower bounds also be striped out
of the clauses list?
** accepting range queries also seems like something that people should be able to disable
* apparently "pf" was changed to iteratively build boosting phrase queries for every 'pair'
of words, and "pf3" is a new param to build boosting phrase queries for every 'triple' of
words in the input. while this certainly seems useful, it's not back-compatable .. why not
restore 'pf' to it's original purpose, and add "pf2" for hte pairs?
* what is the motivation for ExtendedSolrQueryParser.makeDismax? ... i see that the boost
queries built from the pf and pf3 fields are put in BooleanQueries instead of DisjunctionMaxQueries
... but why? (if the user searches for a phrase that's common in many fields of one document,
that document is going to get a huge score boost regardless of the "tie" value, which kind
of defeats the point of what the dismax parser is trying to do)
* we should remove the extremely legacy "/* legacy logic */" for dealing with "bq" ... almost
no one should care about that, we really don't need to carry it forward in a new parser.
* there are a lot of empty catch blocks that seem like they should at least log a warning
or debug message.
* ExtendedAnalyzer feels like a really big hack ... i'm not certain, but i don't think it
works correctly if a CharFilter is declared.
* we need to document all these new params ("pf3", "lowercaseOperators", "boost", 


Thoughts while testing it out on some really hairy edge cases that break the old dismax parser...

* this is really cool
* this is really freaking cool.
* still has a problem with search strings like "foo &&" and "foo ||" ... i suspect
it would be an easy fix to recognize these just like AND/OR are recognized and escaped.
* once we fix some of hte issues mentioned above, we should absolutely register this using
the name "dismax" by default, and register the old one as "oldDismax" with a note in CHANGES.txt
telling people to use defType=oldDismax if they really need it.




> extended dismax query parser
> ----------------------------
>
>                 Key: SOLR-1553
>                 URL: https://issues.apache.org/jira/browse/SOLR-1553
>             Project: Solr
>          Issue Type: New Feature
>            Reporter: Yonik Seeley
>             Fix For: 1.5
>
>         Attachments: SOLR-1553.patch
>
>
> An improved user-facing query parser based on dismax

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message