manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Juan Pablo Diaz-Vaz <jpdiaz...@mcplusa.com>
Subject Re: Amazon CloudSearch Connector question
Date Tue, 09 Feb 2016 13:49:31 GMT
Hi,

The patch worked and now at least the POST has content. Amazon is
responding with a Parsing Error though.

I logged the message before it gets posted to Amazon and it's not a valid
JSON, it had extra commas and parenthesis characters when concatenating
records. Don't know if this is an issue on my setup or the JSONArrayReader.

[{
"id": "100D84BAF0BF348EC6EC593E5F5B1F49585DF555",
"type": "add",
"fields": {
 <record fields>
}
}, , {
"id": "1E6DC8BA1E42159B14658321FDE0FC2DC467432C",
"type": "add",
"fields": {
 <record fields>
}
}, , , , , , , , , , , , , , , , {
"id": "92C7EDAD8398DAC797A7DEA345C1859E6E9897FB",
"type": "add",
"fields": {
 <record fields>
}
}, , , ]

Thanks,

On Mon, Feb 8, 2016 at 7:17 PM, Juan Pablo Diaz-Vaz <jpdiazvaz@mcplusa.com>
wrote:

> Thanks! I'll apply it and let you know how it goes.
>
> On Mon, Feb 8, 2016 at 6:51 PM, Karl Wright <daddywri@gmail.com> wrote:
>
>> Ok, I have a patch.  It's actually pretty tiny; the bug is in our code,
>> not Commons-IO, but Commons-IO changed things so that it tweaked it.
>>
>> I've created a ticket (CONNECTORS-1271) and attached the patch to it.
>>
>> Thanks!
>> Karl
>>
>>
>> On Mon, Feb 8, 2016 at 4:27 PM, Karl Wright <daddywri@gmail.com> wrote:
>>
>>> I have chased this down to a completely broken Apache Commons-IO
>>> library.  It no longer works with the JSONReader objects in ManifoldCF at
>>> all, and refuses to read anything from them.  Unfortunately I can't change
>>> versions of that library because other things depend upon it. So I'll need
>>> to write my own code to replace its functionality.  That will take some
>>> amount of time to do.
>>>
>>> This probably happened the last time our dependencies were updated.  My
>>> apologies.
>>>
>>> Karl
>>>
>>>
>>> On Mon, Feb 8, 2016 at 4:18 PM, Juan Pablo Diaz-Vaz <
>>> jpdiazvaz@mcplusa.com> wrote:
>>>
>>>> Thanks,
>>>>
>>>> Don't know if it'll help, but removing the usage of JSONObjectReader on
>>>> addOrReplaceDocumentWithException and posting to Amazon chunk-by-chunk
>>>> instead of using the JSONArrayReader on flushDocuments, changed the error
I
>>>> was getting from Amazon.
>>>>
>>>> Maybe those objects are failing on parsing the content to JSON.
>>>>
>>>> On Mon, Feb 8, 2016 at 6:04 PM, Karl Wright <daddywri@gmail.com> wrote:
>>>>
>>>>> Ok, I'm debugging away, and I can confirm that no data is getting
>>>>> through.  I'll have to open a ticket and create a patch when I find the
>>>>> problem.
>>>>>
>>>>> Karl
>>>>>
>>>>>
>>>>> On Mon, Feb 8, 2016 at 3:15 PM, Juan Pablo Diaz-Vaz <
>>>>> jpdiazvaz@mcplusa.com> wrote:
>>>>>
>>>>>> Thank you very much.
>>>>>>
>>>>>> On Mon, Feb 8, 2016 at 5:13 PM, Karl Wright <daddywri@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Ok, thanks, this is helpful -- it clearly sounds like Amazon
is
>>>>>>> unhappy about the JSON format we are sending it.  The deprecation
message
>>>>>>> is probably a strong clue.  I'll experiment here with logging
document
>>>>>>> contents so that I can give you further advice.  Stay tuned.
>>>>>>>
>>>>>>> Karl
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Feb 8, 2016 at 3:07 PM, Juan Pablo Diaz-Vaz <
>>>>>>> jpdiazvaz@mcplusa.com> wrote:
>>>>>>>
>>>>>>>> I'm actually not seeing anything on Amazon. The CloudSearch
>>>>>>>> connector fails when sending the request to amazon cloudsearch:
>>>>>>>>
>>>>>>>> AmazonCloudSearch: Error sending document chunk 0: '{"status":
>>>>>>>> "error", "errors": [{"message": "[*Deprecated*: Use the outer
message
>>>>>>>> field] Encountered unexpected end of file"}], "adds": 0,
"__type":
>>>>>>>> "#DocumentServiceException", "message": "{ [\"Encountered
unexpected end of
>>>>>>>> file\"] }", "deletes": 0}'
>>>>>>>>
>>>>>>>> ERROR 2016-02-08 20:04:16,544 (Job notification thread) -
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Feb 8, 2016 at 5:00 PM, Karl Wright <daddywri@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> If you can possibly include a snippet of the JSON you
are seeing
>>>>>>>>> on the Amazon end, that would be great.
>>>>>>>>>
>>>>>>>>> Karl
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Mon, Feb 8, 2016 at 2:45 PM, Karl Wright <daddywri@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> More likely this is a bug.
>>>>>>>>>>
>>>>>>>>>> I take it that it is the body string that is not
coming out,
>>>>>>>>>> correct?  Do all the other JSON fields look reasonable?
 Does the body
>>>>>>>>>> clause exist and is just empty, or is it not there
at all?
>>>>>>>>>>
>>>>>>>>>> Karl
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Mon, Feb 8, 2016 at 2:36 PM, Juan Pablo Diaz-Vaz
<
>>>>>>>>>> jpdiazvaz@mcplusa.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> When running a copy of the job, but with SOLR
as a target, I'm
>>>>>>>>>>> seeing the expected content being posted to SOLR,
so it may not be an issue
>>>>>>>>>>> with TIKA. After adding some more logging to
the CloudSearch connector, I
>>>>>>>>>>> think the data is getting lost just before passing
it to the
>>>>>>>>>>> DocumentChunkManager, which inserts the empty
records to the DB. Could it
>>>>>>>>>>> be that the JSONObjectReader doesn't like my
data?
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Feb 8, 2016 at 3:48 PM, Karl Wright <daddywri@gmail.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi Juan,
>>>>>>>>>>>>
>>>>>>>>>>>> I'd try to reproduce as much of the pipeline
as possible using
>>>>>>>>>>>> a solr output connection.  If you include
the tika extractor in the
>>>>>>>>>>>> pipeline, you will want to configure the
solr connection to not use the
>>>>>>>>>>>> extracting update handler.  There's a checkbox
on the Schema tab you need
>>>>>>>>>>>> to uncheck for that.  But if you do that
you can see what is being sent to
>>>>>>>>>>>> Solr pretty exactly; it all gets logged in
the INFO messages dumped to solr
>>>>>>>>>>>> log.  This should help you figure out if
the problem is your tika
>>>>>>>>>>>> configuration or not.
>>>>>>>>>>>>
>>>>>>>>>>>> Please give this a try and let me know what
happens.
>>>>>>>>>>>>
>>>>>>>>>>>> Karl
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Mon, Feb 8, 2016 at 1:28 PM, Juan Pablo
Diaz-Vaz <
>>>>>>>>>>>> jpdiazvaz@mcplusa.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I've successfully sent data to FileSystems
and SOLR, but for
>>>>>>>>>>>>> Amazon CloudSearch I'm seeing that only
empty messages are being sent to my
>>>>>>>>>>>>> domain. I think this may be an issue
on how I've setup the TIKA Extractor
>>>>>>>>>>>>> Transformation or the field mapping.
I think the Database where the records
>>>>>>>>>>>>> are supposed to be stored before flushing
to Amazon, is storing empty
>>>>>>>>>>>>> content.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I've tried to find documentation on how
to setup the TIKA
>>>>>>>>>>>>> Transformation, but I haven't been able
to find any.
>>>>>>>>>>>>>
>>>>>>>>>>>>> If someone could provide an example of
a job setup to send
>>>>>>>>>>>>> from a FileSystem to CloudSearch, that'd
be great!
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks in advance,
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> Juan Pablo Diaz-Vaz Varas,
>>>>>>>>>>>>> Full Stack Developer - MC+A Chile
>>>>>>>>>>>>> +56 9 84265890
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Juan Pablo Diaz-Vaz Varas,
>>>>>>>>>>> Full Stack Developer - MC+A Chile
>>>>>>>>>>> +56 9 84265890
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Juan Pablo Diaz-Vaz Varas,
>>>>>>>> Full Stack Developer - MC+A Chile
>>>>>>>> +56 9 84265890
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Juan Pablo Diaz-Vaz Varas,
>>>>>> Full Stack Developer - MC+A Chile
>>>>>> +56 9 84265890
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Juan Pablo Diaz-Vaz Varas,
>>>> Full Stack Developer - MC+A Chile
>>>> +56 9 84265890
>>>>
>>>
>>>
>>
>
>
> --
> Juan Pablo Diaz-Vaz Varas,
> Full Stack Developer - MC+A Chile
> +56 9 84265890
>



-- 
Juan Pablo Diaz-Vaz Varas,
Full Stack Developer - MC+A Chile
+56 9 84265890

Mime
View raw message