lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Romani Rupasinghe <>
Subject KeywordTokenizerFactory splits the string for the exclamation mark
Date Tue, 13 May 2014 15:14:54 GMT
Hi All

I have a following field settings in solr schema

<field name="<b>Exact_Word" omitPositions="true" termVectors="false"
omitTermFreqAndPositions="true" compressed="true" type="string_ci"
multiValued="false" indexed="true" stored="true" required="false"

<field name="Word" compressed="true" type="email_text_ptn"
multiValued="false" indexed="true" stored="true" required="false"

<fieldtype name="string_ci" class="solr.TextField" sortMissingLast="true"

<copyField source="Word" dest="Exact_Word"/>

As you can see Exact_Word has the KeywordTokenizerFactory and that should
treat the string as it is.

Following is my responseHeader. As you can see I am searching my string
only in the filed Exact_Word and expecting it to return the Word field and
the score


But when I enter email with the following string "d!" it splits the string to two. I was under the
impression that KeywordTokenizerFactory will treat the string as it is.

Following is the query debug result. There you can see it has split the word

can someone please tell why it produce the query result as this

If I put a string without the "!" sign as below, the produced query will be
as below
 "parsedquery":"+DisjunctionMaxQuery((",. This is what I expected
solr to even with the "!" mark. with "_" mark it wont do a string split and
treats the string as it is

I thought if the KeywordTokenizerFactory is applied then it should return
the exact string as it is

Please help me to understand what is going wrong here

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message