lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Óscar Marín Miró <oscarmarinm...@gmail.com>
Subject Re: Problem with UTF-8 and Solr ISOLatin1AccentFilterFactory
Date Fri, 20 Mar 2009 13:10:23 GMT
Hi,
Maybe this info is handy for you:

http://dev.mysql.com/doc/refman/5.0/en/charset-connection.html

The fact is Mysql can have UTF8 in its storage engine (or defined by
database), as you have, but the *connection* to the mysql client, can be set
to latin1.
In fact, here are my character_set variables:

character_set_client = latin1
character_set_connection = latin1
character_set_database = utf8
character_set_filesystem = binary
character_set_results = latin1
character_set_server = latin1
character_set_system = utf8
character_sets_dir = /usr/share/mysql/charsets/

As you see, the database is in utf8, *but* the client, connection, results
and server, expects latin1. You can see this variables through a mysql
console, just typing:

$ mysql -u user -p
Enter password:
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 8114
Server version: 5.0.32-Debian_7etch5-log Debian etch distribution

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

mysql> SHOW VARIABLES LIKE 'character_set%';
+--------------------------+----------------------------+
| Variable_name            | Value                      |
+--------------------------+----------------------------+
| character_set_client     | latin1                     |
| character_set_connection | latin1                     |
| character_set_database   | latin1                     |
| character_set_filesystem | binary                     |
| character_set_results    | latin1                     |
| character_set_server     | latin1                     |
| character_set_system     | utf8                       |
| character_sets_dir       | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
8 rows in set (0.00 sec)

and change them like this:

mysql> SET character_set_client = utf8;
Query OK, 0 rows affected (0.00 sec)

mysql> SHOW VARIABLES LIKE 'character_set%';
+--------------------------+----------------------------+
| Variable_name            | Value                      |
+--------------------------+----------------------------+
| character_set_client     | utf8                       |
| character_set_connection | latin1                     |
| character_set_database   | latin1                     |
| character_set_filesystem | binary                     |
| character_set_results    | latin1                     |
| character_set_server     | latin1                     |
| character_set_system     | utf8                       |
| character_sets_dir       | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
8 rows in set (0.00 sec)

So... maybe after setting all variables that are set to latin1 to utf8 can
solve your problem? If they are set to latin1, of course ;)

If this is not the problem, hell, we escaped from work just for a few
minutes :P

On Fri, Mar 20, 2009 at 1:25 PM, aerox7 <amyne.berrada@me.com> wrote:

>
> My DATABASE is already in UTF-8 (Collation and Charset).
>
> I already set Tomcat connector to UTF-8, and Mysql default charset to
> UTF-8.... How to force mysql to send on UTF-8 (Or may be i have to do this
> for TomCat ?)
>
> i'm going crazy... :)
>
>
> Shalin Shekhar Mangar wrote:
> >
> > On Fri, Mar 20, 2009 at 5:34 PM, aerox7 <amyne.berrada@me.com> wrote:
> >
> >>
> >> Yes ! i completely understand the problem. I'm just asking about your
> >> solution to resolvre this problem.
> >>
> >> I gess that you use Solar PERL Client to index your DATABASE. for my
> case
> >> i
> >> use DataImportHandler, so to only solution that i have with this is to
> >> create a transformer for DataImportHandler and try to convert my row
> from
> >> latin to UTF-8. (see
> >>
> >>
> http://wiki.apache.org/solr/DataImportHandler#head-27fcc2794bd71f7d727104ffc6b99e194bdb6ff9
> >> )
> >>
> >> So i just wanna know if you use DataImportHandler two with a perl script
> >> like a transformer ?
> >>
> >
> > No, but you can use any language which is available on the Java VM. For
> > example, Javascript (available by default on JDK6), JRuby, Jython,
> Groovy,
> > BeanShell etc.
> >
> > But you may not need to do so much. Look at
> >
> http://www.mysqlperformanceblog.com/2009/03/17/converting-character-sets/
> >
> > --
> > Regards,
> > Shalin Shekhar Mangar.
> >
> >
>
> --
> View this message in context:
> http://www.nabble.com/Problem-with-UTF-8-and-Solr-ISOLatin1AccentFilterFactory-tp22607642p22619285.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>


-- 
“I may not believe in myself, but I believe in what I'm doing.”

-- Jimmy Page

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message