manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Markus Schuch <markus_sch...@web.de>
Subject Do we support UTF-16 chars in version strings when using MySQL/MariaDB?
Date Wed, 23 Jan 2019 08:00:37 GMT
Hi,

while using MySQL/MariaDB for MCF i encountered a "deadlock" kind of
situation caused by a UTF-16 character (e.g. U+1F3AE) in a String
inserted in one of the varchar colums.

In my case a connector wrote th title of a parent document in to the
version string of the process document, which contained the character
U+1F3AE - a gamepad :)

This lead to SQL Error 22001 "Incorrect string value: '\xF0\x9F\x8E\xAE'
for column 'lastversion' at row 1" in mysql because the utf8 collation
encoding does not support that kind of chars. (utf8mb4 does)

The cause was hard to find, because it somehow it lead to a transaction
abort loop in the incremental ingester and the error was not logged
properly.

My question:
- should we create the mysql database with utf8mb4 by default?
- or should inserted strings be sanatized from UTF-16 chars?
- or should 22001 be handled better?

Thanks in advance
Markus

Mime
View raw message