lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <hossman_luc...@fucit.org>
Subject Re: docFreq disable / disable end of word letter removal
Date Wed, 12 Jul 2006 18:51:17 GMT
:    - Currently, I see that the docFreq is also playing in the
: scoring. Is is possible to disable this feature so that this is not
: calculated in the score ?

this is a fairly core aspect of the Lucene scoring calculation, but it can
be changed with a small bit of java coding.  If you write your own
subclass of "Similarity" you can override the "idf" function to return a
constanct value regardless of the docFreq.  You can then specify your new
Similarity class by name in your schema.xml and Solr will use it instead
of the default...

http://lucene.apache.org/java/docs/api/org/apache/lucene/search/Similarity.html

:    - I see that solr is stripping some characters at the end of the
: search words. This is okay, but i try, for example, a search on
: "comed", and he is searching for "come". Can I select when the system
: will remove which letters and when ? Or where can I disable this
: system ? The removal of the trailing "s" is great, but for some
: circumstances, the "d" removal of "comed" is not the ideal way.

This is all determined by the Analyzer used for each field (or more
generally: field type) ... this is also configured via the schema.xml.  As
with SImilarity, you can write your own java subclass to use if you want
extremely customized behavior, or you can use any of the Analyzers that
come with lucene (by name) or you can build up an Analyzer in your
schema.xml using solr TokenizerFactories and TokenFilterFactories.
Docs on all of the Solr Factories can be found in the wiki...

http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters
http://lucene.apache.org/java/docs/api/org/apache/lucene/analysis/Analyzer.html



-Hoss


Mime
View raw message