manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: Amazon CloudSearch Connector question
Date Mon, 08 Feb 2016 21:04:23 GMT
Ok, I'm debugging away, and I can confirm that no data is getting through.
I'll have to open a ticket and create a patch when I find the problem.

Karl


On Mon, Feb 8, 2016 at 3:15 PM, Juan Pablo Diaz-Vaz <jpdiazvaz@mcplusa.com>
wrote:

> Thank you very much.
>
> On Mon, Feb 8, 2016 at 5:13 PM, Karl Wright <daddywri@gmail.com> wrote:
>
>> Ok, thanks, this is helpful -- it clearly sounds like Amazon is unhappy
>> about the JSON format we are sending it.  The deprecation message is
>> probably a strong clue.  I'll experiment here with logging document
>> contents so that I can give you further advice.  Stay tuned.
>>
>> Karl
>>
>>
>> On Mon, Feb 8, 2016 at 3:07 PM, Juan Pablo Diaz-Vaz <
>> jpdiazvaz@mcplusa.com> wrote:
>>
>>> I'm actually not seeing anything on Amazon. The CloudSearch connector
>>> fails when sending the request to amazon cloudsearch:
>>>
>>> AmazonCloudSearch: Error sending document chunk 0: '{"status": "error",
>>> "errors": [{"message": "[*Deprecated*: Use the outer message field]
>>> Encountered unexpected end of file"}], "adds": 0, "__type":
>>> "#DocumentServiceException", "message": "{ [\"Encountered unexpected end of
>>> file\"] }", "deletes": 0}'
>>>
>>> ERROR 2016-02-08 20:04:16,544 (Job notification thread) -
>>>
>>>
>>>
>>> On Mon, Feb 8, 2016 at 5:00 PM, Karl Wright <daddywri@gmail.com> wrote:
>>>
>>>> If you can possibly include a snippet of the JSON you are seeing on the
>>>> Amazon end, that would be great.
>>>>
>>>> Karl
>>>>
>>>>
>>>> On Mon, Feb 8, 2016 at 2:45 PM, Karl Wright <daddywri@gmail.com> wrote:
>>>>
>>>>> More likely this is a bug.
>>>>>
>>>>> I take it that it is the body string that is not coming out, correct?
>>>>> Do all the other JSON fields look reasonable?  Does the body clause exist
>>>>> and is just empty, or is it not there at all?
>>>>>
>>>>> Karl
>>>>>
>>>>>
>>>>> On Mon, Feb 8, 2016 at 2:36 PM, Juan Pablo Diaz-Vaz <
>>>>> jpdiazvaz@mcplusa.com> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> When running a copy of the job, but with SOLR as a target, I'm seeing
>>>>>> the expected content being posted to SOLR, so it may not be an issue
with
>>>>>> TIKA. After adding some more logging to the CloudSearch connector,
I think
>>>>>> the data is getting lost just before passing it to the
>>>>>> DocumentChunkManager, which inserts the empty records to the DB.
Could it
>>>>>> be that the JSONObjectReader doesn't like my data?
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> On Mon, Feb 8, 2016 at 3:48 PM, Karl Wright <daddywri@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Juan,
>>>>>>>
>>>>>>> I'd try to reproduce as much of the pipeline as possible using
a
>>>>>>> solr output connection.  If you include the tika extractor in
the pipeline,
>>>>>>> you will want to configure the solr connection to not use the
extracting
>>>>>>> update handler.  There's a checkbox on the Schema tab you need
to uncheck
>>>>>>> for that.  But if you do that you can see what is being sent
to Solr pretty
>>>>>>> exactly; it all gets logged in the INFO messages dumped to solr
log.  This
>>>>>>> should help you figure out if the problem is your tika configuration
or not.
>>>>>>>
>>>>>>> Please give this a try and let me know what happens.
>>>>>>>
>>>>>>> Karl
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Feb 8, 2016 at 1:28 PM, Juan Pablo Diaz-Vaz <
>>>>>>> jpdiazvaz@mcplusa.com> wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I've successfully sent data to FileSystems and SOLR, but
for Amazon
>>>>>>>> CloudSearch I'm seeing that only empty messages are being
sent to my
>>>>>>>> domain. I think this may be an issue on how I've setup the
TIKA Extractor
>>>>>>>> Transformation or the field mapping. I think the Database
where the records
>>>>>>>> are supposed to be stored before flushing to Amazon, is storing
empty
>>>>>>>> content.
>>>>>>>>
>>>>>>>> I've tried to find documentation on how to setup the TIKA
>>>>>>>> Transformation, but I haven't been able to find any.
>>>>>>>>
>>>>>>>> If someone could provide an example of a job setup to send
from a
>>>>>>>> FileSystem to CloudSearch, that'd be great!
>>>>>>>>
>>>>>>>> Thanks in advance,
>>>>>>>>
>>>>>>>> --
>>>>>>>> Juan Pablo Diaz-Vaz Varas,
>>>>>>>> Full Stack Developer - MC+A Chile
>>>>>>>> +56 9 84265890
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Juan Pablo Diaz-Vaz Varas,
>>>>>> Full Stack Developer - MC+A Chile
>>>>>> +56 9 84265890
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> Juan Pablo Diaz-Vaz Varas,
>>> Full Stack Developer - MC+A Chile
>>> +56 9 84265890
>>>
>>
>>
>
>
> --
> Juan Pablo Diaz-Vaz Varas,
> Full Stack Developer - MC+A Chile
> +56 9 84265890
>

Mime
View raw message