manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: Fetching output Elastic Search data in pipelines
Date Wed, 07 Mar 2018 13:02:14 GMT
Hi Nikita,

You have not selected the "use mapper attachment" checkbox in the
configuration for the ES output connector.  But you are using it in Elastic
Search.  The ES output connector will not convert binary to base64 unless
you check that box.

Karl


On Wed, Mar 7, 2018 at 6:18 AM, Nikita Ahuja <nikita@smartshore.nl> wrote:

> Hi Karl,
>
>
> This is not only for  Sharepoint it is same for FileShare, Sharepoint and
> Web crawler.
>
> For Elastic Search Output, following parameters are defined.
>
>
>
>
> In the simple history tab, following errors are there.
>
>
>
> Server exception like this comes down, every time it goes for the
> indexation:
>
>
> *Server exception:
> {"error":{"root_cause":[{"type":"exception","reason":"java.lang.IllegalArgumentException:
> java.lang.IllegalArgumentException: Illegal base64 character
> 3f","header":{"processor_type":"attachment"}}],"type":"exception","reason":"java.lang.IllegalArgumentException:
> java.lang.IllegalArgumentException: Illegal base64 character
> 3f","caused_by":{"type":"illegal_argument_exception","reason":"java.lang.IllegalArgumentException:
> Illegal base64 character
> 3f","caused_by":{"type":"illegal_argument_exception","reason":"Illegal
> base64 character
> 3f"}},"header":{"processor_type":"attachment"}},"status":500} *
>
>
>
> But if we don't define any value in the pipeline tab, it goes directly in
> the index. there is some problem with the code. Here I need to use
> different pipelines in the same index like for Website: web and for
> FileShare: file, etc.
>
>
> Thanks and Regards,
> Nikita
>
>
>
>
>
>
> On Wed, Mar 7, 2018 at 2:45 PM, Karl Wright <daddywri@gmail.com> wrote:
>
>> Hi Nikita,
>>
>> The downstream pipeline for a connector determines which mime types are
>> indexed and which are rejected.  If you look in the Simple History report
>> for one of the rejected SharePoint documents, there should be information
>> recorded about why it was rejected.  If there's no non-image documents at
>> all described from SharePoint, then the issue would have to be how the
>> SharePoint repository connection in the job is specified.
>>
>> Thanks,
>> Karl
>>
>>
>> On Wed, Mar 7, 2018 at 2:29 AM, Nikita Ahuja <nikita@smartshore.nl>
>> wrote:
>>
>>> Hi Karl,
>>>
>>>
>>> I am trying to ingest the data from website ans Sharepoint to Elastic
>>> Search output in different pipelines in same index.
>>>
>>> But the ManifoldCF is not able to ingest all the data. It only put image
>>> files present in the source to ElasticSearch output.
>>>
>>> Is there anything which is being missed?
>>>
>>>
>>> Please guide for a solution.
>>>
>>> Thanks and Regards,
>>> Nikita
>>>
>>
>>
>

Mime
View raw message