lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joe Zhang <smartag...@gmail.com>
Subject Re: behavior of solr.KeepWordFilterFactory
Date Mon, 03 Dec 2012 11:44:44 GMT
across-the-board case-senstive indexing is not what I want...

Let me make sure I understand your suggestion:

       <fieldType name="text1" class="solr.TextField"
positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory"/>

                <filter class="solr.LowerCaseFilterFactory"/>

            </analyzer>
</fieldType>

       <fieldType name="text2" class="solr.TextField"
positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory"/>

            </analyzer>
</fieldType>


And define content1 as text1, content2 as text2?
On Mon, Dec 3, 2012 at 1:09 AM, Xi Shen <davidshen84@gmail.com> wrote:

> Solr index is case-sensitive by default, unless you used the lower case
> filter. I remember I saw this topic on Solr, and the solution is simple:
>
> copy the filed;
> use a new analyzer/tokenizer to process this field, and do not use lower
> case filter
>
> when query, make sure both fields are included.
>
>
> On Mon, Dec 3, 2012 at 3:04 PM, Joe Zhang <smartagent@gmail.com> wrote:
>
> > In other words, what I wanted to achieve is case-senstive indexing on a
> > small set of words. Can anybody help?
> >
> > On Sun, Dec 2, 2012 at 11:56 PM, Joe Zhang <smartagent@gmail.com> wrote:
> >
> > > To be more specific, this is the data type I was using:
> > >
> > >        <fieldType name="textspecial" class="solr.TextField"
> > >             positionIncrementGap="100">
> > >             <analyzer>
> > >                 <tokenizer class="solr.StandardTokenizerFactory"/>
> > >                 <filter class="solr.KeepWordFilterFactory"
> > > words="tickers.txt" ignoreCase="false"/>
> > >                 <filter class="solr.StopFilterFactory"
> > >                     ignoreCase="true" words="stopwords.txt"/>
> > >                 <filter class="solr.WordDelimiterFilterFactory"
> > >                     generateWordParts="1" generateNumberParts="1"
> > >                     catenateWords="1" catenateNumbers="1"
> catenateAll="0"
> > >                     splitOnCaseChange="1"/>
> > >                 <filter class="solr.LowerCaseFilterFactory"/>
> > >                 <filter class="solr.EnglishPorterFilterFactory"
> > >                     protected="protwords.txt"/>
> > >                 <filter
> class="solr.RemoveDuplicatesTokenFilterFactory"/>
> > >             </analyzer>
> > >         </fieldType>
> > >
> > >
> > > On Sun, Dec 2, 2012 at 11:51 PM, Joe Zhang <smartagent@gmail.com>
> wrote:
> > >
> > >> yes, that is the correct behavior. But how do I achieve my goal, i.e,
> > >> speical treatment on a list of uppercase/special words, normal
> > treatment on
> > >> everything else?
> > >>
> > >>
> > >> On Sun, Dec 2, 2012 at 11:46 PM, Xi Shen <davidshen84@gmail.com>
> wrote:
> > >>
> > >>> By the definition on
> > >>>
> > >>>
> >
> https://lucene.apache.org/solr/api-3_6_1/org/apache/solr/analysis/KeepWordFilter.html
> > >>> ,
> > >>> I am pretty sure it is the correct behavior of this filter :)
> > >>>
> > >>> I guess you are trying to this filter to index some special words in
> > >>> Chinese?
> > >>>
> > >>>
> > >>> On Mon, Dec 3, 2012 at 1:54 PM, Joe Zhang <smartagent@gmail.com>
> > wrote:
> > >>>
> > >>> > I defined the following data type in my solr schema.xml
> > >>> >
> > >>> > <fieldtype name="testkeep" class="solr.TextField">
> > >>> >    <analyzer>
> > >>> >      <filter class="solr.KeepWordFilterFactory"
> words="keepwords.txt"
> > >>> > ignoreCase="false"/>
> > >>> >    </analyzer>
> > >>> > </fieldtype>
> > >>> >
> > >>> > when I use the type "testkeep" to index a test field, my true
> > >>> expecation
> > >>> > was to make sure solr indexes the uppercase form of a small list
of
> > >>> words
> > >>> > in the file, AND TREAT EVERY OTHER WORD AS USUAL. The goal of
> > securing
> > >>> the
> > >>> > closed list is achieved, but NO OTHER WORD outside the list is
> > indexed!
> > >>> >
> > >>> > Can anybody help? Thanks in advance!
> > >>> >
> > >>> > Joe
> > >>> >
> > >>>
> > >>>
> > >>>
> > >>> --
> > >>> Regards,
> > >>> David Shen
> > >>>
> > >>> http://about.me/davidshen
> > >>> https://twitter.com/#!/davidshen84
> > >>>
> > >>
> > >>
> > >
> >
>
>
>
> --
> Regards,
> David Shen
>
> http://about.me/davidshen
> https://twitter.com/#!/davidshen84
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message