manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Karl Wright (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CONNECTORS-1251) Confluence umlauts broken
Date Sun, 29 Nov 2015 11:48:11 GMT

    [ https://issues.apache.org/jira/browse/CONNECTORS-1251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15030893#comment-15030893
] 

Karl Wright commented on CONNECTORS-1251:
-----------------------------------------

I've looked at the code, and could find no obvious encoding issues.  Specifically, I looked
at this:

{code}
  private <T extends ConfluenceResource> ConfluenceResponse<T> responseFromHttpEntity(HttpEntity
entity, ConfluenceResourceBuilder<T> builder)
      throws Exception {
    String stringEntity = EntityUtils.toString(entity);

    JSONObject responseObject;
    try {
      responseObject = new JSONObject(stringEntity);
      ConfluenceResponse<T> response = ConfluenceResponse
          .fromJson(responseObject, builder);
      if (response.getResults().size() == 0) {
        logger.debug("[Processing] No {} found in the Confluence response", builder.getType().getSimpleName());
      }

      return response;
    } catch (JSONException e) {
      logger.debug("Error parsing JSON response");
      throw new Exception();
    }

  }
{code}

... which calls EntityUtils.toString(), which should be sufficient.
There is some concern that doing all of this in memory is not a good idea; we usually stream
content that can be unbounded, rather than convert to a single string.


> Confluence umlauts broken
> -------------------------
>
>                 Key: CONNECTORS-1251
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-1251
>             Project: ManifoldCF
>          Issue Type: Bug
>          Components: Confluence connector
>    Affects Versions: ManifoldCF 2.2
>         Environment: Ubuntu Linux 14.04
> Java 1.8.0_51-b16
> Tomcat 7.0.52
>            Reporter: Jens Grassel
>            Assignee: Antonio David Pérez Morales
>              Labels: umlauts, unicode
>             Fix For: ManifoldCF 2.3
>
>
> Hi,
> I've noticed that the confluence connector seems to be unable to cope with special characters
like umlauts (ä, ö, ü, etc.). In our index they are broken for example {{ü}} becomes {{ü}}.
> I tried to pipe the extracted content through the tika extractor but the result was the
same.
> Regards,
> Jens



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message