manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: ElasticSearch Mappings Question
Date Mon, 15 Jul 2013 07:21:37 GMT
Hi Hermo,

(1) sounds like the right approach to me.  The only thing you'd need to
figure out is how it would actually work, because unless I misunderstand
something the json the connector would need to form to send to ES would
have to have values substituted in it from the document being indexed.

Karl



On Mon, Jul 15, 2013 at 1:21 AM, Hermo <hermo.terblanche@euroling.se> wrote:

> Thanks Richard and Karl!
>
> I read up about CONNECTORS-690 and it gave me a good background regarding
> the ElasticSearch and MCF integration.
>
> Regarding the jira ticket, I am not yet sure how things work.
> Started with MCF only a week ago with the task to research how we could
> use MCF to crawl Windows Shares and SharePoint sites to index the data with
> ElasticSearch in a schema that our search product could understand and
> query.
>
> Just trying to figure out where the right place would be to implement a
> translation from connector specific JSON properties to our proprietary
> product properties in ES.
>
> Options I have thought about in order of most likely:
>
> 1) A translation config file (configuration tab in UI) for the connector.
> (In my opinion this sounds like the right place for it?)
> 2) A plugin in ES that could do the translation from one schema to the
> other. (Not sure how plugins in ES work or what they do)
> 3) Custom build of the connector with own schema coded into the connector
> (not desirable for upgrade reasons)
>
> Architecturally I would like to hear your recommendations before creating
> the jira ticket.
>
> Regards,
>
> Hermo
>
> On 12 Jul 2013, at 23:12 , "Nichols, Richard" <Richard.Nichols@tellabs.com>
> wrote:
>
> > Hermo,
> >
> > Be aware that changes for CONNECTORS-690 (which I believe will be in the
> next release) will expect that you have already set up a specific default
> mapping.  I realize this doesn't actually address your issue, but you
> should be aware of it.
> >
> > Rick
> >
> > -----Original Message-----
> > From: Karl Wright [mailto:daddywri@gmail.com]
> > Sent: Friday, July 12, 2013 9:26 AM
> > To: Hermo; user@manifoldcf.apache.org
> > Subject: RE: ElasticSearch Mappings Question
> >
> > Hi Hermo,
> > The mapping is determined by the elasticsearch connector. As of this
> > time, there is no way to change it other than to change the connector
> > code.
> >
> > If you want such a feature, please create a jira ticket describing what
> > you think the connector should be able to do.
> >
> > Thanks,
> > Karl
> >
> > Sent from my Windows Phone
> > From: Hermo
> > Sent: 7/12/2013 9:30 AM
> > To: user@manifoldcf.apache.org
> > Subject: ElasticSearch Mappings Question
> > Hi,
> >
> > I have the following scenario:
> >
> > I configured a Job with a Windows Share repository connector, and an
> > ElasticSearch output connector.
> >
> > It seems that, when a file in the share is crawled, it is ingested in
> > ElasticSearch with a very specific mapping as follows:
> > "myindex" is the name of the index and "docs" is the type.
> >
> > {
> > "myindex" : {
> >   "docs" : {
> >     "properties" : {
> >       "_content_type" : {
> >         "type" : "string"
> >       },
> >       "_name" : {
> >         "type" : "string"
> >       },
> >       "allow_token_document" : {
> >         "type" : "string"
> >       },
> >       "allow_token_share" : {
> >         "type" : "string"
> >       },
> >       "deny_token_document" : {
> >         "type" : "string"
> >       },
> >       "deny_token_share" : {
> >         "type" : "string"
> >       },
> >       "file" : {
> >         "type" : "string"
> >       },
> >       "lastModified" : {
> >         "type" : "string"
> >       },
> >       "type" : {
> >         "type" : "string"
> >       }
> >     }
> >   }
> > }
> > }
> >
> > I have the following questions:
> > 1) What determines this mapping: The repository connector (in above
> > scenario the windows share connector), or the ElasticSearch connector?
> > 2) Is there a way that I could specify my own mapping, for example I
> > would like to map the _name property to something different like
> > _productname. Where would be the correct place to do this?
> >
> > Looking forward to any help and suggestions.
> >
> > Regards,
> > Hermo Terblanche
> >
> >
> > ============================================================
> > The information contained in this message may be privileged
> > and confidential and protected from disclosure. If the reader
> > of this message is not the intended recipient, or an employee
> > or agent responsible for delivering this message to the
> > intended recipient, you are hereby notified that any reproduction,
> > dissemination or distribution of this communication is strictly
> > prohibited. If you have received this communication in error,
> > please notify us immediately by replying to the message and
> > deleting it from your computer. Thank you. Tellabs
> > ============================================================
> >
>
>

Mime
View raw message