lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lucas Teixeira <lucas.teixe...@accurate.com.br>
Subject Accented chars (Portuguese)
Date Thu, 28 Feb 2008 11:50:28 GMT
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
</head>
<body bgcolor="#ffffff" text="#000000">
<font size="-1"><font face="DejaVu Sans">Hello all,<br>
<br>
I'm using the <u>solr.ISOLatin1AccentFilterFactory</u> TokenFilter in
my schema.xml inside both &lt;index&gt; and &lt;query&gt; tag, but I'm
having some continuous problemas with accented chars in portuguese
(&aacute;&eacute;&iacute;&oacute;&uacute;&agrave;&egrave;&igrave;&ograve;&ugrave;&atilde;&#297;&otilde;&#361;&auml;&euml;&iuml;&ouml;&uuml;.....).
And this is making my search engin handle
this type of queries annormally.<br>
<br>
I think the IsoLatin Filter it's ok, once I'm having the same results
searching with the accented chars or not. My problem is that it seems
the IsoLatin Filter it's just ignoring these chars, and not replacing
by its unaccented chars (like its docs says). For example, I've indexed
one document whit the title:<br>
<br>
<b>Barraca Cocoric&oacute; - Multibrink<br>
<br>
</b>And when I query the word: <b>cocoric&oacute;</b> I can't get the
document. When I search the word <b>cocorico,</b> I still can't get
this document. But when I search for <b>cocoric</b> there is my
document.<br>
<br>
This is my indexing schema<br>
<br>
Have anybody had these same problems sometime?<br>
<br>
Thank you all,<br>
<br>
[]s,<br>
<br>
Lucas<b><br>
</b></font></font>
<br>
</body>
</html>

Mime
View raw message