manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: ElasticSearch Oddities
Date Fri, 07 Jun 2013 16:56:02 GMT
CONNECTORS-707 and CONNECTORS-708.

Karl



On Fri, Jun 7, 2013 at 12:48 PM, Karl Wright <daddywri@gmail.com> wrote:

> >>>>>>
> 1)      I didn’t set the “Allowed MIME Types” on the ES tab in the job to
> allow “application/xml”.  I was expecting to have all of the rows filtered
> out.  That didn’t happen.  All rows returned were indexed by ES anyway.
> <<<<<<
>
> That's probably because the JDBC connector does not call the appropriate
> method to check whether the mimetype will be accepted by the output
> connector or not.  It's up to the repository connector to do this, and is
> optional.  But this is worth creating a ticket for I think.
>
>
> >>>>>>
>  2)      Some of the columns (which are of type nvarchar) have embedded
> linefeed and/or return characters in them (e.g. mult-line addresses).
> These are getting flagged as JSON errors by ES (as containing an ‘unescaped
> character’).  I see that ElasticSearchIndex::
>
> jsonStringEscape() doesn’t deal with non-printable characters.  Should it?
>
> <<<<<<
>
>
> Yes.  This one definitely should have a ticket.
>
>
> Karl
>
>
>
>
> On Fri, Jun 7, 2013 at 12:43 PM, Nichols, Richard <
> Richard.Nichols@tellabs.com> wrote:
>
>>  Karl,****
>>
>> ** **
>>
>> Now that we have MCF sending documents to ES so that they are properly
>> being scanned, I’m finding a couple of oddities.****
>>
>> ** **
>>
>> I’m using the JDBC connector to feed ES, where the main ‘document’
>> (identified by the $(DATACOLUMN) variable) is in XML.  Therefore, I set the
>> $(CONTENTTYPE) column to ‘application/xml’.   Generally, this works.  But…
>> ****
>>
>> ** **
>>
>> **1)      **I didn’t set the “Allowed MIME Types” on the ES tab in the
>> job to allow “application/xml”.  I was expecting to have all of the rows
>> filtered out.  That didn’t happen.  All rows returned were indexed by ES
>> anyway.****
>>
>> **2)      **Some of the columns (which are of type nvarchar) have
>> embedded linefeed and/or return characters in them (e.g. mult-line
>> addresses).  These are getting flagged as JSON errors by ES (as containing
>> an ‘unescaped character’).  I see that
>> ElasticSearchIndex::jsonStringEscape() doesn’t deal with non-printable
>> characters.  Should it?****
>>
>> ** **
>>
>> Regards,****
>>
>> Rick****
>>
>> ** **
>>
>> Richard D. Nichols****
>>
>> Staff Engineer****
>>
>> Tellabs, Inc.****
>>
>> 18583 N. Dallas Parkway****
>>
>> Dallas, TX  75287****
>>
>> Office: (972) 588-6942****
>>
>> richard.nichols@tellabs.com****
>>
>> [image: Tellabs] <http://www.tellabs.com/>[image: TellabsTwitter]<http://www.twitter.com/tellabs>[image:
>> TellabsBlog] <http://www.tellabs.com/blog>****
>>
>> Want the latest news on what’s driving the telecom industry? *Subscribe
>> to Tellabs Insight Magazine<http://www.tellabs.com/news/insight/subscribe.cfm>
>> ***
>>
>>  ****
>>
>> ** **
>>
>> ============================================================
>> The information contained in this message may be privileged
>> and confidential and protected from disclosure. If the reader
>> of this message is not the intended recipient, or an employee
>> or agent responsible for delivering this message to the
>> intended recipient, you are hereby notified that any reproduction,
>> dissemination or distribution of this communication is strictly
>> prohibited. If you have received this communication in error,
>> please notify us immediately by replying to the message and
>> deleting it from your computer. Thank you. Tellabs
>> ============================================================
>>
>
>

Mime
View raw message