manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: UTF-8 Format from Confluence to Solr
Date Mon, 12 Jun 2017 23:34:14 GMT
Committed a fix.
Karl


On Mon, Jun 12, 2017 at 7:27 PM, Karl Wright <daddywri@gmail.com> wrote:

> There's already a ticket for this, assigned to me.  CONNECTORS-1251.  I'll
> freshen it up.
>
> Karl
>
>
>
>
> On Mon, Jun 12, 2017 at 2:52 PM, Furkan KAMACI <furkankamaci@gmail.com>
> wrote:
>
>> Hi Marisol,
>>
>> You can create a ticket from here: https://issues.apache.or
>> g/jira/projects/CONNECTORS
>>
>> Kind Regards,
>> Furkan KAMACI
>>
>>
>> 12 Haz 2017 Pzt, saat 18:25 tarihinde Marisol Redondo <
>> marisol.redondo.garcia@gmail.com> şunu yazdı:
>>
>>> How can I do that?
>>>
>>> On 1 June 2017 at 16:43, Antonio David Pérez Morales <
>>> adperezmorales@gmail.com> wrote:
>>>
>>>> Hi Marisol
>>>>
>>>> Could you mind to create a ticket and provide a patch?
>>>>
>>>> This way we can test it in our ends and include it for the next
>>>> Manifold release.
>>>>
>>>> Thanks
>>>>
>>>> Regards
>>>>
>>>> 2017-06-01 16:28 GMT+02:00 Marisol Redondo <
>>>> marisol.redondo.garcia@gmail.com>:
>>>>
>>>>> I fixed the problem.
>>>>>
>>>>> The problem is that the Confluence connector is getting the entity of
>>>>> the request with the default encoding ("ISO-8859-1"), and not UTF-8.
>>>>>
>>>>> To fix that, I made a change in the Confluence connector, and each
>>>>> time is reading the request's entity I use EntityUtils.toString(entit
>>>>> y,*"UTF-8"*)
>>>>>
>>>>> Thanks
>>>>>
>>>>>
>>>>> On 31 May 2017 at 10:13, Marisol Redondo <
>>>>> marisol.redondo.garcia@gmail.com> wrote:
>>>>>
>>>>>> Hi.
>>>>>>
>>>>>> I'm having problems with the encoding when injecting in Solr 6 in
>>>>>> standalone mode from a Confluence wiki.
>>>>>>
>>>>>> I have Manifold 2.5 with Tomcat-8.
>>>>>>
>>>>>> The repository connector from the job take the information from a
>>>>>> Confluence wiki and the output connector is Solr, using the Tika
>>>>>> transformation, a custom transformation and a Metadata adjuster.
>>>>>>
>>>>>> When the document is injected into solr, the content of the document
>>>>>> has some character that shouldn't be there because are not in the
>>>>>> confluence page, mainly a  character.
>>>>>>
>>>>>> I have checked that confluence, the tomcat server when manifold is
>>>>>> running, the http request to confluence has the Accept-Charset header
set
>>>>>> to UTF-8, the solr server is acepting UTF8.
>>>>>>
>>>>>> In the log, I have seen that when retrieving the information from
>>>>>> confluence, the content is fine, and when it's sending the information
to
>>>>>> solr, it has the character. I have tried without using any transfomer
and
>>>>>> getting the same log entry.
>>>>>>
>>>>>> Is this a bug or how can I resolve this?
>>>>>>
>>>>>> Thanks for your help
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>

Mime
View raw message