manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Juan Pablo Diaz-Vaz <jpdiaz...@mcplusa.com>
Subject Re: Amazon CloudSearch Connector question
Date Mon, 08 Feb 2016 20:07:39 GMT
I'm actually not seeing anything on Amazon. The CloudSearch connector fails
when sending the request to amazon cloudsearch:

AmazonCloudSearch: Error sending document chunk 0: '{"status": "error",
"errors": [{"message": "[*Deprecated*: Use the outer message field]
Encountered unexpected end of file"}], "adds": 0, "__type":
"#DocumentServiceException", "message": "{ [\"Encountered unexpected end of
file\"] }", "deletes": 0}'

ERROR 2016-02-08 20:04:16,544 (Job notification thread) -



On Mon, Feb 8, 2016 at 5:00 PM, Karl Wright <daddywri@gmail.com> wrote:

> If you can possibly include a snippet of the JSON you are seeing on the
> Amazon end, that would be great.
>
> Karl
>
>
> On Mon, Feb 8, 2016 at 2:45 PM, Karl Wright <daddywri@gmail.com> wrote:
>
>> More likely this is a bug.
>>
>> I take it that it is the body string that is not coming out, correct?  Do
>> all the other JSON fields look reasonable?  Does the body clause exist and
>> is just empty, or is it not there at all?
>>
>> Karl
>>
>>
>> On Mon, Feb 8, 2016 at 2:36 PM, Juan Pablo Diaz-Vaz <
>> jpdiazvaz@mcplusa.com> wrote:
>>
>>> Hi,
>>>
>>> When running a copy of the job, but with SOLR as a target, I'm seeing
>>> the expected content being posted to SOLR, so it may not be an issue with
>>> TIKA. After adding some more logging to the CloudSearch connector, I think
>>> the data is getting lost just before passing it to the
>>> DocumentChunkManager, which inserts the empty records to the DB. Could it
>>> be that the JSONObjectReader doesn't like my data?
>>>
>>> Thanks,
>>>
>>> On Mon, Feb 8, 2016 at 3:48 PM, Karl Wright <daddywri@gmail.com> wrote:
>>>
>>>> Hi Juan,
>>>>
>>>> I'd try to reproduce as much of the pipeline as possible using a solr
>>>> output connection.  If you include the tika extractor in the pipeline, you
>>>> will want to configure the solr connection to not use the extracting update
>>>> handler.  There's a checkbox on the Schema tab you need to uncheck for
>>>> that.  But if you do that you can see what is being sent to Solr pretty
>>>> exactly; it all gets logged in the INFO messages dumped to solr log.  This
>>>> should help you figure out if the problem is your tika configuration or not.
>>>>
>>>> Please give this a try and let me know what happens.
>>>>
>>>> Karl
>>>>
>>>>
>>>> On Mon, Feb 8, 2016 at 1:28 PM, Juan Pablo Diaz-Vaz <
>>>> jpdiazvaz@mcplusa.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I've successfully sent data to FileSystems and SOLR, but for Amazon
>>>>> CloudSearch I'm seeing that only empty messages are being sent to my
>>>>> domain. I think this may be an issue on how I've setup the TIKA Extractor
>>>>> Transformation or the field mapping. I think the Database where the records
>>>>> are supposed to be stored before flushing to Amazon, is storing empty
>>>>> content.
>>>>>
>>>>> I've tried to find documentation on how to setup the TIKA
>>>>> Transformation, but I haven't been able to find any.
>>>>>
>>>>> If someone could provide an example of a job setup to send from a
>>>>> FileSystem to CloudSearch, that'd be great!
>>>>>
>>>>> Thanks in advance,
>>>>>
>>>>> --
>>>>> Juan Pablo Diaz-Vaz Varas,
>>>>> Full Stack Developer - MC+A Chile
>>>>> +56 9 84265890
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Juan Pablo Diaz-Vaz Varas,
>>> Full Stack Developer - MC+A Chile
>>> +56 9 84265890
>>>
>>
>>
>


-- 
Juan Pablo Diaz-Vaz Varas,
Full Stack Developer - MC+A Chile
+56 9 84265890

Mime
View raw message