lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sebastian M <mihais...@yahoo.com>
Subject Solr Spellcheker automatically tokenizes on period marks
Date Wed, 22 Dec 2010 15:32:18 GMT

Hello,


My main (full text) index contains the terms "www", "sometest", "com", which
is intended and correct.

My spellcheck index contains the term "www.sometest.com". which is also
intended and correct.

However, when querying the spellchecker using the query "www.sometest.com",
I get the suggestion "www.www.sometest.com.com", despite the fact that I'm
not using a tokenizer that splits on "." (period marks) as part of my
spellcheck query analyzer. 

When running the Field Analyzer (in the Solr admin page), I can see that
even after the last filter (see below), my term text remains
"www.sometest.com", which is untokenized, as expected. 

Any thoughts as to what may be causing this undesired tokenization?

To summarize:

Main index contains: "www", "sometest", "com"
Spellcheck index contains: "www.sometest.com"
Spellcheck query: "www.sometest.com"
Expected result: (no suggestion)
Actual result: "www.www.sometest.com.com"


Here is my spellcheck query analyzer:
<analyzer type="query">
	<tokenizer class="solr.WhitespaceTokenizerFactory"/>
	<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
ignoreCase="true" expand="true"/>
	<filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt"/>
	<filter class="solr.StandardFilterFactory"/>
	<filter class="solr.LowerCaseFilterFactory"/>
	<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>



Thank you in advance; any suggestions are welcome!
Sebastian
-- 
View this message in context: http://lucene.472066.n3.nabble.com/Solr-Spellcheker-automatically-tokenizes-on-period-marks-tp2131844p2131844.html
Sent from the Solr - User mailing list archive at Nabble.com.

Mime
View raw message