lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Karich <peat...@yahoo.de>
Subject WordDelimiterFilter bug
Date Thu, 18 Nov 2010 22:26:17 GMT
  Hi,

I asked this on the user list and I think I found a bug in 
WordDelimiterFilterFactory
for splitOnCaseChange="1" catenateAll="0" preserveOriginal="1" (+ 
lowercase filter).
Add the following test* and append the definition to the schema.xml**
and it won't pass. Should I open a JIRA issue for this or isn't this a 
bug and I missed something?
(The strange thing is that the admin GUI will highlight it correctly)

Regards,
Peter.

BTW: I just read the code of SpellCheckCollator because it didn't 
compile. It is:
} catch (Exception e) {
           Log.warn("Exception trying to re-query to check if a spell 
check possibility would return any hits.", e);
It should NOT use jetty Log -> remove jetty dep
} catch (Exception e) {
           LOG.warn("Exception trying to re-query to check if a spell 
check possibility would return any hits.", e);

*
   @Test
   public void testCaseChangeAndPreserve() {
     assertU(adoc("id",  "1",
                  "subword_cc", "abcd"));
     assertU(adoc("id",  "2",
                  "subword_cc", "abCd.com"));
     assertU(commit());

     assertQ("simple - case change and preserve",
             req("subword_cc:(abcd)")
             ,"//result[@numFound=1]"
     );
     // returns at the moment only doc 2
     // should also return doc1 because abCd should preserved + 
lowercase filter (for the query)
     assertQ("camel case query - case change and preserve",
             req("subword_cc:(abCd)")
             ,"//result[@numFound=2]"
     );
     // returns at the moment 0 docs
     // should return doc2 because abCd.com should preserved + lowercase 
filter (for the index)
     assertQ("camel case domain - case change and preserve",
             req("subword_cc:(abcd.com)")
             ,"//result[@numFound=1]"
     );
     clearIndex();
   }

**
<fieldtype name="subword_cc" class="solr.TextField" 
positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.WordDelimiterFilterFactory" splitOnCaseChange="1" 
catenateAll="0" preserveOriginal="1"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.WordDelimiterFilterFactory" splitOnCaseChange="1" 
catenateAll="0" preserveOriginal="1"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldtype>

<field name="subword_cc" type="subword_cc" indexed="true" stored="true"/>



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message