lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Emir Arnautovic <emir.arnauto...@sematext.com>
Subject Re: FW: Difference Between Tokenizer and filter
Date Wed, 02 Mar 2016 09:33:16 GMT
Hi Rajesh,
Processing flow is same for both indexing and querying. What is compared 
at the end are resulting tokens. In general flow is: text -> char filter 
-> filtered text -> tokenizer -> tokens -> filter1 -> tokens ... -> 
filterN -> tokens.

You can read more about analysis chain in Solr wiki: 
https://cwiki.apache.org/confluence/display/solr/Understanding+Analyzers,+Tokenizers,+and+Filters

Regards,
Emir

-- 
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/



On 02.03.2016 10:00, G, Rajesh wrote:
> Hi Team,
>
> Can you please clarify the below. My understanding is tokenizer is used to say how the
content should be indexed physically in file system. Filters are used to query result. The
blow lines are from my setup. But I have seen eg that include filters inside <analyzer
type=”index”> and tokenizer in <analyzer type=”query”> that confused me.
>
>                  <fieldType name="customSearch" class="solr.TextField" positionIncrementGap="100"
>
>                                  <analyzer type="index">
>                                     <tokenizer class="solr.LowerCaseTokenizerFactory"/>
>                                     <tokenizer class="solr.StandardTokenizerFactory"/>
>                                     <tokenizer class="solr.NGramTokenizerFactory"
minGramSize="2" maxGramSize="2"/>
>                                  </analyzer>
>                                  <analyzer type="query">
>                                     <filter class="solr.NGramFilterFactory" minGramSize="2"
maxGramSize="2"/>
>                                  </analyzer>
>                  </fieldType>
>
> My goal is to user solr and find the best match among the technology names e.g
> Actual tech name
>
> 1.       Microsoft Visual Studio
>
> 2.       Microsoft Internet Explorer
>
> 3.       Microsoft Visio
>
> When user types Microsoft Visal Studio user should get Microsoft Visual Studio. Basically
misspelled and jumble words should match closest tech name
>
>
>
>
>
> Corporate Executive Board India Private Limited. Registration No: U741040HR2004PTC035324.
Registered office: 6th Floor, Tower B, DLF Building No.10 DLF Cyber City, Gurgaon, Haryana-122002,
India..
>
>
>
> This e-mail and/or its attachments are intended only for the use of the addressee(s)
and may contain confidential and legally privileged information belonging to CEB and/or its
subsidiaries, including CEB subsidiaries that offer SHL Talent Measurement products and services.
If you have received this e-mail in error, please notify the sender and immediately, destroy
all copies of this email and its attachments. The publication, copying, in whole or in part,
or use or dissemination in any other way of this e-mail and attachments by anyone other than
the intended person(s) is prohibited.


Mime
View raw message