Return-Path: X-Original-To: apmail-lucene-dev-archive@www.apache.org Delivered-To: apmail-lucene-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 70328D1D3 for ; Fri, 19 Oct 2012 19:34:14 +0000 (UTC) Received: (qmail 21834 invoked by uid 500); 19 Oct 2012 19:34:13 -0000 Delivered-To: apmail-lucene-dev-archive@lucene.apache.org Received: (qmail 21779 invoked by uid 500); 19 Oct 2012 19:34:13 -0000 Mailing-List: contact dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@lucene.apache.org Delivered-To: mailing list dev@lucene.apache.org Received: (qmail 21771 invoked by uid 99); 19 Oct 2012 19:34:13 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 19 Oct 2012 19:34:13 +0000 Date: Fri, 19 Oct 2012 19:34:12 +0000 (UTC) From: "KuroSaka TeruHiko (JIRA)" To: dev@lucene.apache.org Message-ID: <1771301686.2541.1350675253025.JavaMail.jiratomcat@arcas> In-Reply-To: <715745391.59610.1350501724434.JavaMail.jiratomcat@arcas> Subject: [jira] [Updated] (SOLR-3962) For the match-all-docs query *:*, (e)dismax parser passes "*:*" to tokenizer, sub-optimal (<1.0) hit scores MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/SOLR-3962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] KuroSaka TeruHiko updated SOLR-3962: ------------------------------------ Summary: For the match-all-docs query *:*, (e)dismax parser passes "*:*" to tokenizer, sub-optimal (<1.0) hit scores (was: For the match-all-docs query *:*, (e)dismax parser passes "*:*" to tokenizer. Under certain conditions, hit suboptimal (<1.0) score is reported.) > For the match-all-docs query *:*, (e)dismax parser passes "*:*" to tokenizer, sub-optimal (<1.0) hit scores > ----------------------------------------------------------------------------------------------------------- > > Key: SOLR-3962 > URL: https://issues.apache.org/jira/browse/SOLR-3962 > Project: Solr > Issue Type: Bug > Components: query parsers > Affects Versions: 3.5, 3.6, 4.0 > Reporter: KuroSaka TeruHiko > > My understanding is that the special match-all-docs query "\*:\*" shouldn't call tokenizers and all hits should have score 1.0. In fact, this is usually the case. > But, when all of these conditions are met, suboptimal (<1.0) hit scores are reported: > * dismax or edismax parser is used > * a tokenizer that splits "\*:\*" into multiple tokens is used > * pf parameter is specified for a field that uses the above tokenizer > Use case: > * We created a Japanese tokenizer which happens to break "\*:\*" into three tokens representing each symbols. > * Our client uses this tokenizer for Japanese with edismax on Solr 3.6. > * They have pf=text^0.5 in the default section in solrconfig.xml. > * When search is done with the query string "\*:\*", all the hits from Japanese has the score much less than 1.0. > Below is how to simulate this situation with a NGramTokenizer. (It is not realistic.) > 1. Run Solr with the default setting. Post all *.xml docs in examples/exampledocs. > 2. Stop the Solr. > 3. Add this fieldType: > {noformat} > > > > > > maxGramSize="1" > minGramSize="1" /> > > > {noformat} > 4. Change the field definition of "name" to use "text_fake". > 5. Restart Solr > 6. GET this URL: > http://localhost:8983/solr/select?indent=on&version=2.2&q=*%3A*&fq=&start=0&rows=10&fl=*%2Cscore&qt=&wt=&debugQuery=on&defType=edismax&pf=name > Below is an excerpt of query debug output. Notice that "\*:\*" is expanded with spaces to "\* : \*": > {noformat} > ... > > ati > ATI Technologies > > 33 Commerce Valley Drive East Thornhill, ON L3T 7N6 Canada > > 1415830106362871808 > 0.07443535 > > > > *:* > *:* > > (+MatchAllDocsQuery(*:*) DisjunctionMaxQuery((name:"* : *")))/no_coord > > {noformat} > And here is a partial stack trace at the time the tokenizer is called from the query parser: > {noformat} > NGramTokenizer.incrementToken() line: 112 > CachingTokenFilter.fillCache() line: 90 > CachingTokenFilter.incrementToken() line: 55 > ExtendedDismaxQParser$ExtendedSolrQueryParser(QueryParserBase).newFieldQuery(Analyzer, String, String, boolean) line: 513 > ExtendedDismaxQParser$ExtendedSolrQueryParser.newFieldQuery(Analyzer, String, String, boolean) line: 1018 > ExtendedDismaxQParser$ExtendedSolrQueryParser(QueryParserBase).getFieldQuery(String, String, boolean) line: 474 > ExtendedDismaxQParser$ExtendedSolrQueryParser(SolrQueryParser).getFieldQuery(String, String, boolean) line: 169 > ExtendedDismaxQParser$ExtendedSolrQueryParser.getQuery() line: 1163 > ExtendedDismaxQParser$ExtendedSolrQueryParser.getAliasedQuery() line: 1105 > ExtendedDismaxQParser$ExtendedSolrQueryParser.getQueries(Alias) line: 1145 > ExtendedDismaxQParser$ExtendedSolrQueryParser.getAliasedQuery() line: 1073 > ExtendedDismaxQParser$ExtendedSolrQueryParser.getFieldQuery(String, String, int) line: 989 > ExtendedDismaxQParser$ExtendedSolrQueryParser(QueryParserBase).handleQuotedTerm(String, Token, Token) line: 1082 > ExtendedDismaxQParser$ExtendedSolrQueryParser(QueryParser).Term(String) line: 462 > ExtendedDismaxQParser$ExtendedSolrQueryParser(QueryParser).Clause(String) line: 257 > ExtendedDismaxQParser$ExtendedSolrQueryParser(QueryParser).Query(String) line: 181 > ExtendedDismaxQParser$ExtendedSolrQueryParser(QueryParser).TopLevelQuery(String) line: 170 > ExtendedDismaxQParser$ExtendedSolrQueryParser(QueryParserBase).parse(String) line: 120 > ExtendedDismaxQParser.addShingledPhraseQueries(BooleanQuery, List, Map, int, float, int) line: 506 > ExtendedDismaxQParser.parse() line: 338 > ExtendedDismaxQParser(QParser).getQuery() line: 143 > QueryComponent.prepare(ResponseBuilder) line: 118 > SearchHandler.handleRequestBody(SolrQueryRequest, SolrQueryResponse) line: 192 > SearchHandler(RequestHandlerBase).handleRequest(SolrQueryRequest, SolrQueryResponse) line: 129 > ... > {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional commands, e-mail: dev-help@lucene.apache.org