lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Allison, Timothy B." <talli...@mitre.org>
Subject RE: Complexphrase treats wildcards differently than other query parsers
Date Mon, 09 Oct 2017 14:34:31 GMT
<face_palm/>  Right.  Sorry.

Despite appearances to the contrary, I'm not a bot designed to lead you down the garden path
of debugging for yourself with the goal of increasing the size of the Solr contributor pool...

I confirmed the failure in 6.x, but all seems to work in 7.x and trunk.  I opened SOLR-11450
and attached a unit test based on your correction of mine. 😊

Thank you, again!


-----Original Message-----
From: Bjarke Buur Mortensen [mailto:mortensen@eluence.com] 
Sent: Monday, October 9, 2017 8:39 AM
To: solr-user@lucene.apache.org
Subject: Re: Complexphrase treats wildcards differently than other query parsers

Thanks again, Tim,
following your recipe, I was able to write a failing test:

    assertQ(req("q", "{!complexphrase} iso-latin1:cr\u00E6zy*")
    , "//result[@numFound='1']"
    , "//doc[./str[@name='id']='1']"
    );

Notice how cr\u00E6zy* is used as a query term which mimics the behaviour I originally reported,
namely that CPQP does not analyse it because of the wildcard and thus does not hit the charfilter
from the query side.


2017-10-06 20:54 GMT+02:00 Allison, Timothy B. <tallison@mitre.org>:

> That could be it.  I'm not able to reproduce this with trunk.  More 
> next week.
>
> In trunk, if I add this to schema15.xml:
>   <fieldType name="text_iso_latin1_mapping" class="solr.TextField">
>     <analyzer>
>       <charFilter class="solr.MappingCharFilterFactory" 
> mapping="mapping- ISOLatin1Accent.txt"/>
>       <tokenizer class="solr.MockTokenizerFactory"/>
>     </analyzer>
>   </fieldType>
>   <field name="iso-latin1" type="text_iso_latin1_mapping" indexed="true"
> stored="true"/>
>
> This test passes.
>
>   @Test
>   public void testCharFilter() {
>     assertU(adoc("iso-latin1", "cr\u00E6zy tr\u00E6n", "id", "1"));
>     assertU(commit());
>     assertU(optimize());
>
>     assertQ(req("q", "{!complexphrase} iso-latin1:craezy")
>         , "//result[@numFound='1']"
>         , "//doc[./str[@name='id']='1']"
>     );
>
>     assertQ(req("q", "{!complexphrase} iso-latin1:traen")
>         , "//result[@numFound='1']"
>         , "//doc[./str[@name='id']='1']"
>     );
>
>     assertQ(req("q", "{!complexphrase} iso-latin1:caezy~1")
>         , "//result[@numFound='1']"
>         , "//doc[./str[@name='id']='1']"
>     );
>
>     assertQ(req("q", "{!complexphrase} iso-latin1:crae*")
>         , "//result[@numFound='1']"
>         , "//doc[./str[@name='id']='1']"
>     );
>
>     assertQ(req("q", "{!complexphrase} iso-latin1:*aezy")
>         , "//result[@numFound='1']"
>         , "//doc[./str[@name='id']='1']"
>     );
>
>     assertQ(req("q", "{!complexphrase} iso-latin1:crae*y")
>         , "//result[@numFound='1']"
>         , "//doc[./str[@name='id']='1']"
>     );
>
>     assertQ(req("q", "{!complexphrase} iso-latin1:\"craezy traen\"")
>         , "//result[@numFound='1']"
>         , "//doc[./str[@name='id']='1']"
>     );
>
>     assertQ(req("q", "{!complexphrase} iso-latin1:\"caezy~1 traen\"")
>         , "//result[@numFound='1']"
>         , "//doc[./str[@name='id']='1']"
>     );
>
>     assertQ(req("q", "{!complexphrase} iso-latin1:\"craez* traen\"")
>         , "//result[@numFound='1']"
>         , "//doc[./str[@name='id']='1']"
>     );
>
>     assertQ(req("q", "{!complexphrase} iso-latin1:\"*aezy traen\"")
>         , "//result[@numFound='1']"
>         , "//doc[./str[@name='id']='1']"
>     );
>
>     assertQ(req("q", "{!complexphrase} iso-latin1:\"crae*y traen\"")
>         , "//result[@numFound='1']"
>         , "//doc[./str[@name='id']='1']"
>     );
>   }
>
>
>
> -----Original Message-----
> From: Bjarke Buur Mortensen [mailto:mortensen@eluence.com]
> Sent: Friday, October 6, 2017 6:46 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Complexphrase treats wildcards differently than other 
> query parsers
>
> Thanks a lot for your effort, Tim.
>
> Looking at it from the Solr side, I see some use of local classes. The 
> snippet below in particular caught my eye (in 
> solr/core/src/java/org/apache/ solr/search/ComplexPhraseQParserPlugin.java).
> The instance of ComplexPhraseQueryParser is not the clean one from 
> Lucene, but a modified one. If any of the modifications messes with 
> the analysis logic, well then that might answer it.
>
> What do you make of it?
>
> lparser = new ComplexPhraseQueryParser(defaultField, getReq().getSchema().
> getQueryAnalyzer())
> {
> protected Query newWildcardQuery(org.apache.lucene.index.Term t) { try 
> { org.apache.lucene.search.Query wildcardQuery = reverseAwareParser.
> getWildcardQuery(t.field(), t.text()); 
> setRewriteMethod(wildcardQuery); return wildcardQuery; } catch 
> (SyntaxError e) { throw new RuntimeException(e); } } private Query 
> setRewriteMethod(org.apache.lucene.search.Query query) { if (query 
> instanceof MultiTermQuery) {
> ((MultiTermQuery) query).setRewriteMethod( org.apache.lucene.search.
> MultiTermQuery.SCORING_BOOLEAN_REWRITE);
> }
> return query;
> }
> protected Query newRangeQuery(String field, String part1, String 
> part2, boolean startInclusive, boolean endInclusive) { boolean reverse 
> = reverseAwareParser.isRangeShouldBeProtectedFromReverse(field,
> part1);
> return super.newRangeQuery(field,
> reverse ? reverseAwareParser.getLowerBoundForReverse() : part1, part2, 
> startInclusive || reverse, endInclusive); } } ;
>
> Thanks,
> Bjarke
>
>
>
Mime
View raw message