lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: Solr pattern tokenizer
Date Mon, 02 Feb 2015 12:38:30 GMT
You do not have WordDelimiterFilterFactory in your index-time
analysis chain. And you're using different tokenizers in the two
cases. This will almost certainly lead to "surprising" results
unless you completely and thoroughly understand all the nuances
here.

I _strongly_ recommend you do not do this, and quite a bit of time
with admin/analysis page to understand each of the transformations
in your analysis chain.

You probably want to have WordDelimiterFilterFactory in both, and
simply using phrase queries would do what you want, i.e.
"HDFC LTD"

Best,
Erick

On Mon, Feb 2, 2015 at 4:21 AM, Dikshant Shahi <contactsahi@gmail.com>
wrote:

> Why have you created ngram of size 3? Do you want match also in case of
> spell-mistakes?
> If you want 2 consecutive tokens to match, you can create shingles. Please
> refer to link
>
> https://cwiki.apache.org/confluence/display/solr/Filter+Descriptions#FilterDescriptions-ShingleFilter
>
> Thanks,
> Dikshant
>
> On Mon, Feb 2, 2015 at 3:26 PM, Nivedita <nivedita.patil@tcs.com> wrote:
>
> > Hi,
> >
> > I want to tokenize query like "CHQ PAID-INWARD TRAN-HDFC LTD"  in such a
> > way
> > that it should give me result documnet containing HDFC LTD and not HDFC
> MF.
> >
> > How can I do this.
> > I Have already applied below Tokenizers
> >
> >  <fieldType name="text_general" class="solr.TextField"
> > positionIncrementGap="100">
> >       <analyzer type="index">
> >         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> >
> >         <filter class="solr.StopFilterFactory" ignoreCase="true"
> > words="stopwords.txt" />
> >
> >         <filter class="solr.LowerCaseFilterFactory"/>
> >         <filter class="solr.TrimFilterFactory" />
> >         <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
> >       </analyzer>
> >       <analyzer type="query">
> >         <tokenizer class="solr.StandardTokenizerFactory"/>
> >
> >                 <filter class="solr.WordDelimiterFilterFactory"
> > generateWordParts="1"
> > generateNumberParts="1" catenateWords="0" catenateNumbers="0"
> > catenateAll="0" splitOnCaseChange="1"/>
> >         <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
> > ignoreCase="true" expand="true"/>
> >         <filter class="solr.EdgeNGramFilterFactory" minGramSize="3"
> > maxGramSize="25" side="front"/>
> >         <filter class="solr.LowerCaseFilterFactory"/>
> >                 <filter class="solr.StopFilterFactory"
> > words="stopwords.txt"
> > ignoreCase="true"/>
> >         <filter class="solr.TrimFilterFactory" />
> >       </analyzer>
> >     </fieldType>
> >
> >
> > Please help.
> >
> >
> >
> > --
> > View this message in context:
> > http://lucene.472066.n3.nabble.com/Solr-pattern-tokenizer-tp4183421.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message