manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marisol Redondo <marisol.redondo.gar...@gmail.com>
Subject Re: UTF-8 Format from Confluence to Solr
Date Mon, 12 Jun 2017 15:25:07 GMT
How can I do that?

On 1 June 2017 at 16:43, Antonio David Pérez Morales <
adperezmorales@gmail.com> wrote:

> Hi Marisol
>
> Could you mind to create a ticket and provide a patch?
>
> This way we can test it in our ends and include it for the next Manifold
> release.
>
> Thanks
>
> Regards
>
> 2017-06-01 16:28 GMT+02:00 Marisol Redondo <marisol.redondo.garcia@gmail.
> com>:
>
>> I fixed the problem.
>>
>> The problem is that the Confluence connector is getting the entity of the
>> request with the default encoding ("ISO-8859-1"), and not UTF-8.
>>
>> To fix that, I made a change in the Confluence connector, and each time
>> is reading the request's entity I use EntityUtils.toString(entity,
>> *"UTF-8"*)
>>
>> Thanks
>>
>>
>> On 31 May 2017 at 10:13, Marisol Redondo <marisol.redondo.garcia@gmail.
>> com> wrote:
>>
>>> Hi.
>>>
>>> I'm having problems with the encoding when injecting in Solr 6 in
>>> standalone mode from a Confluence wiki.
>>>
>>> I have Manifold 2.5 with Tomcat-8.
>>>
>>> The repository connector from the job take the information from a
>>> Confluence wiki and the output connector is Solr, using the Tika
>>> transformation, a custom transformation and a Metadata adjuster.
>>>
>>> When the document is injected into solr, the content of the document has
>>> some character that shouldn't be there because are not in the confluence
>>> page, mainly a  character.
>>>
>>> I have checked that confluence, the tomcat server when manifold is
>>> running, the http request to confluence has the Accept-Charset header set
>>> to UTF-8, the solr server is acepting UTF8.
>>>
>>> In the log, I have seen that when retrieving the information from
>>> confluence, the content is fine, and when it's sending the information to
>>> solr, it has the character. I have tried without using any transfomer and
>>> getting the same log entry.
>>>
>>> Is this a bug or how can I resolve this?
>>>
>>> Thanks for your help
>>>
>>>
>>>
>>>
>>>
>>
>

Mime
View raw message