manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Antonio David Pérez Morales (JIRA) <>
Subject [jira] [Commented] (CONNECTORS-1251) Confluence umlauts broken
Date Thu, 05 Nov 2015 09:08:27 GMT


Antonio David Pérez Morales commented on CONNECTORS-1251:

[~jan0sch] are you using Solr as index backend storage? 
Maybe it can be due to your Solr configuration of the fields because Confluence connector
is using the Confluence REST API for crawling content and UTF-8 as encoding format (If I remember
well). So the umlauts should work well.

Can you test adding a FileSystemOutputConnector for your job and checking if the written files
contain the umlauts or not?

> Confluence umlauts broken
> -------------------------
>                 Key: CONNECTORS-1251
>                 URL:
>             Project: ManifoldCF
>          Issue Type: Bug
>          Components: Confluence connector
>    Affects Versions: ManifoldCF 2.2
>         Environment: Ubuntu Linux 14.04
> Java 1.8.0_51-b16
> Tomcat 7.0.52
>            Reporter: Jens Grassel
>            Assignee: Antonio David Pérez Morales
>              Labels: umlauts, unicode
>             Fix For: ManifoldCF 2.3
> Hi,
> I've noticed that the confluence connector seems to be unable to cope with special characters
like umlauts (ä, ö, ü, etc.). In our index they are broken for example {{ü}} becomes {{ü}}.
> I tried to pipe the extracted content through the tika extractor but the result was the
> Regards,
> Jens

This message was sent by Atlassian JIRA

View raw message