lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tomas Fernandez Lobbe <tomasflo...@yahoo.com.ar>
Subject Re: Search with accent
Date Wed, 10 Nov 2010 20:47:29 GMT
You have to modify the field type you are using in your schema.xml file. This is 
the "text" field type of Solr 1.4.1 exmple with this filter added:

    <fieldType name="text" class="solr.TextField" positionIncrementGap="100">
      <analyzer type="index">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.ASCIIFoldingFilterFactory"/>
        <filter class="solr.StopFilterFactory"
                ignoreCase="true"
                words="stopwords.txt"
                enablePositionIncrements="true"
                />
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" 
generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" 
splitOnCaseChange="1"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.SnowballPorterFilterFactory" language="English" 
protected="protwords.txt"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.ASCIIFoldingFilterFactory"/>
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" 
ignoreCase="true" expand="true"/>
        <filter class="solr.StopFilterFactory"
                ignoreCase="true"
                words="stopwords.txt"
                enablePositionIncrements="true"
                />
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" 
generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" 
splitOnCaseChange="1"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.SnowballPorterFilterFactory" language="English" 
protected="protwords.txt"/>
      </analyzer>
    </fieldType>







________________________________
De: Claudio Devecchi <cdevecchi@gmail.com>
Para: solr-user@lucene.apache.org
Enviado: miércoles, 10 de noviembre, 2010 17:44:01
Asunto: Re: Search with accent

Ok tks,

I'm new with solr, my doubt is how can I enable theses feature. Or these
feature is already working by default?

Is this something to config on my schema.xml?

Tks!!


On Wed, Nov 10, 2010 at 6:40 PM, Tomas Fernandez Lobbe <
tomasflobbe@yahoo.com.ar> wrote:

> That's what the ASCIIFoldingFilter does, it removes the accents, that's why
> you
> have to add it to the query analisis chain and to the index analysis chain,
> to
> search the same way you index.
>
>
>
> You can see how it works from the Analysis page on Solr Admin.
>
>
>
>
>
> ________________________________
> De: Savvas-Andreas Moysidis <savvas.andreas.moysidis@googlemail.com>
> Para: solr-user@lucene.apache.org
> Enviado: miércoles, 10 de noviembre, 2010 17:27:24
> Asunto: Re: Search with accent
>
> have you tried using a TokenFilter which removes accents both at
> indexing and searching time? If you index terms without accents and
> search the same
> way you should be able to find all documents as you require.
>
>
>
> On 10 November 2010 20:25, Tomas Fernandez Lobbe
> <tomasflobbe@yahoo.com.ar>wrote:
>
> > It looks like ISOLatin1AccentFilter is deprecated on Solr 1.4.1, If you
> are
> > on
> > that version, you should use the ASCIIFoldingFilter instead.
> >
> > Like with any other filter, to use it, you have to add the filter factory
> > to the
> > analysis chain of the field type you are using:
> >
> > <filter class="solr.ASCIIFoldingFilterFactory"/>
> >
> > Make sure you add it to the query and index analysis chain, otherwise
> > you'll
> > have extrage results.
> >
> > You'll have to perform a full reindex.
> >
> > Tomás
> >
> >
> >
> >
> > ________________________________
> > De: Claudio Devecchi <cdevecchi@gmail.com>
> > Para: solr-user@lucene.apache.org
> > Enviado: miércoles, 10 de noviembre, 2010 17:08:06
> > Asunto: Re: Search with accent
> >
> > Tomas,
> >
> > Let me try to explain better.
> >
> > For example.
> >
> > - I have 10 documents, where 7 have the word pereque (without accent) and
> 3
> > have the word perequê (with accent)
> >
> > When I do a search pereque, solr is returning just 7, and when I do a
> > search
> > perequê solr is returning 3.
> >
> > But for me, these words are the same, and when I do some search for
> perequê
> > or pereque, it should show me 10 results.
> >
> >
> > About the ISOLatin you told, do you know how can I enable it?
> >
> > tks,
> > Claudio
> >
> > On Wed, Nov 10, 2010 at 5:00 PM, Tomas Fernandez Lobbe <
> > tomasflobbe@yahoo.com.ar> wrote:
> >
> > > I don't understand, when the user search for perequê you want the
> results
> > > for
> > > perequê and pereque?
> > >
> > > If thats the case, any field type with ISOLatin1AccentFilterFactory
> > should
> > > work.
> > > The accent should be removed at index time and at query time (Make sure
> > the
> > > filter is being applied on both cases).
> > >
> > > Tomás
> > >
> > >
> > >
> > >
> > >
> > > ________________________________
> > > De: Claudio Devecchi <cdevecchi@gmail.com>
> > > Para: Lista Solr <solr-user@lucene.apache.org>
> > > Enviado: miércoles, 10 de noviembre, 2010 15:16:24
> > > Asunto: Search with accent
> > >
> > > Hi all,
> > >
> > > Somebody knows how can I config my solr to make searches with and
> without
> > > accents?
> > >
> > > for example:
> > >
> > > pereque and perequê
> > >
> > >
> > > When I do it I need the same result, but its not working.
> > >
> > > tks
> > > --
> > >
> > >
> > >
> > >
> > >
> >
> >
> >
> > --
> > Claudio Devecchi
> > flickr.com/cdevecchi
> >
> >
> >
> >
>
>
>
>
>



-- 
Claudio Devecchi
flickr.com/cdevecchi



      
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message