lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ahmet Arslan <iori...@yahoo.com.INVALID>
Subject Re: search ignoring accents
Date Fri, 17 Apr 2015 17:07:29 GMT
Hi Pedro,

Requirement of "Filter by "edr" should give the result "Pedro"" can be done expanding terms
at index time only.
You can remove the ngram filter from query analyzer. 
But remember that ngram filter produces a lot of tokens. Try it on analysis page.

Regarding starting at the beginning or the ending, there is an EdgeNGramTokenFilter where
you can specify side, front or back.

Ahmet




On Friday, April 17, 2015 2:50 PM, Pedro Figueiredo <pjlfigueiredo@criticalsoftware.com>
wrote:
And for this example what filter should I use?

Filter by "edr" should give the result "Pedro"
The NGram create tokens starting at the beginning or the ending, and in the middle?

Thanks!

Pedro Figueiredo
Senior Engineer

pjlfigueiredo@criticalsoftware.com
M. 934058150


Rua Engº Frederico Ulrich, nº 2650 4470-605 Moreira da Maia, Portugal
T. +351 229 446 927 | F. +351 229 446 929
www.criticalsoftware.com

PORTUGAL | UK | GERMANY | USA | BRAZIL | MOZAMBIQUE | ANGOLA
A CMMI® LEVEL 5 RATED COMPANY CMMI® is registered in the USPTO by CMU"




-----Original Message-----
From: Pedro Figueiredo [mailto:pjlfigueiredo@criticalsoftware.com] 
Sent: 17 April 2015 12:22
To: solr-user@lucene.apache.org; 'Ahmet Arslan'
Subject: RE: search ignoring accents

Hi Ahmet,

Yes... the EdgeNGram is what produces those results...
I need it to improve the search by name by the applications users.

Thanks.

Pedro Figueiredo
Senior Engineer

pjlfigueiredo@criticalsoftware.com
M. 934058150


Rua Engº Frederico Ulrich, nº 2650 4470-605 Moreira da Maia, Portugal T. +351 229 446 927
| F. +351 229 446 929 www.criticalsoftware.com

PORTUGAL | UK | GERMANY | USA | BRAZIL | MOZAMBIQUE | ANGOLA A CMMI® LEVEL 5 RATED COMPANY
CMMI® is registered in the USPTO by CMU"



-----Original Message-----
From: Ahmet Arslan [mailto:iorixxx@yahoo.com.INVALID]
Sent: 17 April 2015 12:01
To: solr-user@lucene.apache.org
Subject: Re: search ignoring accents

Hi Pedro,

solr.ASCIIFoldingFilterFactory is one way to remove diacritics.
Confusion comes from EdgeNGram, why do you need it?

Ahmet



On Friday, April 17, 2015 1:38 PM, Pedro Figueiredo <pjlfigueiredo@criticalsoftware.com>
wrote:



Hello,

What is the best way to search in a field ignoring accents?

The field has the type:
                <fieldType name="text_general_edge_ngram" class="solr.TextField" positionIncrementGap="100">
                               <analyzer type="index">
                                               <tokenizer class="solr.LowerCaseTokenizerFactory"/>
                                               <filter class="solr.EdgeNGramFilterFactory"
minGramSize="2" maxGramSize="15"/>
                               </analyzer>
                               <analyzer type="query">
                                               <tokenizer class="solr.LowerCaseTokenizerFactory"/>
                                               <filter class="solr.EdgeNGramFilterFactory"
minGramSize="2" maxGramSize="15"/>
                               </analyzer>
                </fieldType>

I’ve tried adding the filter:  <filter class="solr.ASCIIFoldingFilterFactory"/>
but some strange results happened.. like:

Search by “Mourao” and the results were:
Mourão -> OK
Monteiro -> NOTOK
Morais -> NOTOK

Thanks in advanced,

Pedro Figueiredo
Senior Engineer

pjlfigueiredo@criticalsoftware.com
M. 934058150 
  
Rua Engº Frederico Ulrich, nº 2650 4470-605 Moreira da Maia, Portugal T. +351 229 446 927
| F. +351 229 446 929 www.criticalsoftware.com

PORTUGAL | UK | GERMANY | USA | BRAZIL | MOZAMBIQUE | ANGOLA A CMMI® LEVEL 5 RATED COMPANY
CMMI® is registered in the USPTO by CMU"

Mime
View raw message