manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rafa Haro" <rh...@apache.org>
Subject Re: ManifoldCF transformation connector for Apache Stanbol
Date Mon, 07 Dec 2015 11:16:00 GMT
Hi Dileepa,




As I explained to you before, with Solr (and probably this is also true with elastic search,
although it allows you to index nested fields) you can't have nested objects or fields. Besides
that, also within ManifoldCF the metadata is expressed as key, value pairs where values can
be list of objects but nothing beyond that. So, there is not possible to work with complex
structures as metadata, you must plain the stuff before.




In a nutshell, it is not possible to maintain the relationships between entities and entities
metadata. That doesn't mean that it is not interesting to index the semantic metadata information,
even if you can relate them with a concrete entity. Indexing that information would enable
a bunch of uses cases. So, the proposal would be to define LDPath fields by configuration
at the transformation connector. With all the LDPath expressions you would build a LDPATH
program that would pass to the Stanbol enhancer request. When you parse the response, you
just need to go entity by entity taking the LDPath fields values returned and putting them
as metadata using the name of the field as key and the returned value as value.




Does make sense?




Cheers,

Rafa

On Mon, Dec 7, 2015 at 11:17 AM, Dileepa Jayakody <djayakody@zaizi.com>
wrote:

> Hi All,
> While thanking you all for your input on Stanbol connector requirement, I
> would like to continue with modifying the Stanbol connector to be
> compatible with any output connector. If you guys can give some guidance on
> how the entity metadata should be added to the repository document I can
> modify the stanbol connector accordingly.
> From Rafa's comments, I gathered we can add the entity metadata to the
> repo.doc as key value pairs.
> However this idea is not yet clear to me. There could be 'N' number of
> entities in a document and each of them will have some common attributes
> such as name, id, type and specific attributes for particular entity type.
> I'm not clear on how to maintain that structure of N number of entities
> with their attributes in a repo.document as key value pairs and make them
> LDPath compatible for retrieval in an output connector.
> @Rafa
> If you can please elaborate on your suggestion it would be greatly helpful
> to me.
> All other suggestions are also welcome.
> Thanks,
> Dileepa
> On Fri, Nov 13, 2015 at 7:00 PM, Karl Wright <daddywri@gmail.com> wrote:
>> I, too, agree.  Somebody will need to turn this connector into one that
>> plays by the rules.  It may be possible for someone on the team here to do
>> that, but it won't be me; I'm seriously overextended at the moment.  It
>> would be best if someone who knew the connector well could do the necessary
>> work.
>>
>> Karl
>>
>>
>> On Fri, Nov 13, 2015 at 5:45 AM, Rafa Haro <rharoapache@gmail.com> wrote:
>>
>> > I must agree with Antonio. When I started to work on this I was expecting
>> > the connector to work by just extracting the entities and entities
>> metadata
>> > and put them as plain metadata of the documents, probably following
>> LDPATH
>> > queries configuration
>> >
>> >
>> >
>> >
>> > This is probably ok for Sensefy but I don’t think this could be suitable
>> > to be included in the project. But this is only my opinion. Of course, a
>> > version of the connector that fully respect the ManifoldCF architecture
>> > would be more than welcome in my opinion
>> >
>> > On Fri, Nov 13, 2015 at 11:38 AM, Antonio David Pérez Morales
>> > <adperezmorales@gmail.com> wrote:
>> >
>> > > Hi
>> > > The removal of the SolrWrapper is a must. It was a requirement for an
>> > > internal project which has nothing to do here with a normal operation
>> of
>> > > Manifold, so forcing the users to use Solr does not fit the Manifold
>> > > philosophy.
>> > > In my opinion, at this moment, a Stanbol connector with such a big
>> > > dependency which will not fit almost any use case is not very useful.
>> > > You should think a way to convert Stanbol connector into a normal
>> > > Transformation connector without assuming that a specific output
>> > connector
>> > > will be used.
>> > > Regards
>> > > 2015-11-13 11:20 GMT+01:00 Dileepa Jayakody <djayakody@zaizi.com>:
>> > >> Hi guys,
>> > >>
>> > >> I have developed a Stanbol connector for MCF. You can check it out
>> from
>> > our
>> > >> github repo here:
>> > >>
>> > >>
>> >
>> https://github.com/zaizi/sensefy-connectors/tree/master/transformation/mcf-stanbol-connector
>> > >>
>> > >> It requires the SolrWrapper output connector which indexes enhanced
>> > >> documents, entities and entityTypes in separate Solr cores. Basically
>> it
>> > >> requires 3 separate solr cores configured with a specific Solr schema
>> > for
>> > >> primary documents, entities and entityTypes separately. This was done
>> > for
>> > >> our specific use-case.
>> > >>
>> > >> The SolrWrapper code is here :
>> > >>
>> > >>
>> >
>> https://github.com/zaizi/sensefy-connectors/tree/master/output/mcf-solrwrapperconnector
>> > >>
>> > >> Perhaps we can discuss and remove the Stanbol connector's dependency
>> > with
>> > >> SolrWrapper and have it working with any output connector.
>> > >> Please note that the Stanbol connector currently has a bug in the UI
>> > >> (editSpecification) which I'm working on at the moment. After fixing
>> > that I
>> > >> will update here. And also I will provide documentations for
>> configuring
>> > >> the connector.
>> > >>
>> > >> Thanks,
>> > >> Dileepa
>> > >>
>> > >> On Thu, Jul 9, 2015 at 8:36 PM, Antonio David Pérez Morales <
>> > >> adperezmorales@gmail.com> wrote:
>> > >>
>> > >> > Hi Joshua
>> > >> >
>> > >> > It is not the list for that, but Marmotta is already integrated
in
>> > Apache
>> > >> > Stanbol. You can take a look at this issue
>> > >> > https://issues.apache.org/jira/browse/STANBOL-1165 .
>> > >> >
>> > >> > Anyway, as I said this is not the list for that, so let's use
the
>> > proper
>> > >> > list for these things.
>> > >> >
>> > >> > Regards
>> > >> >
>> > >> >
>> > >> >
>> > >> > 2015-07-09 15:29 GMT+02:00 Joshua Dunham <joshua.dunham@gmail.com>:
>> > >> >
>> > >> > > Hey Dileepa,
>> > >> > >
>> > >> > >       In case you were interested, I pinged the list a few
days
>> ago
>> > >> > asking
>> > >> > > for integration tips for Apache Marmotta.
>> > >> > >
>> > >> > > I got some great tips on how to do this which could help
you.
>> Since
>> > >> > > Marmotta is a drop in replacement for Clarezza on Stanbol
it may
>> be
>> > >> > easier
>> > >> > > for you to take this way.
>> > >> > >
>> > >> > > I'm not a Java programmer but I'm bringing this problem to
the
>> > >> > development
>> > >> > > staff at my company for assistance. If you like the Marmotta
>> > approach
>> > >> we
>> > >> > > may gain more traction solving the same integration.
>> > >> > >
>> > >> > > I'm also integrating Marmotta with Stanbol so the effect
would be
>> > the
>> > >> > same
>> > >> > > except not using the Stanbol API for data import in favor
of
>> > Marmotta.
>> > >> > >
>> > >> > > Best,
>> > >> > >
>> > >> > > -J
>> > >> > >
>> > >> > > > On Jul 9, 2015, at 1:03 AM, Dileepa Jayakody <
>> djayakody@zaizi.com
>> > >
>> > >> > > wrote:
>> > >> > > >
>> > >> > > > Hi all,
>> > >> > > >
>> > >> > > > Thanks you for the feedback and offering your help in
this.
>> > >> > > > Let me get back to you on where to start the code base.
>> > >> > > > As the first step, I would like to start by creating
a
>> > architecture
>> > >> > > diagram
>> > >> > > > for the connector.
>> > >> > > > I will send the diagram for your review soon.
>> > >> > > >
>> > >> > > > Thanks,
>> > >> > > > Dileepa
>> > >> > > >
>> > >> > > > --
>> > >> > > >
>> > >> > > > ------------------------------
>> > >> > > > This message should be regarded as confidential. If
you have
>> > received
>> > >> > > this
>> > >> > > > email in error please notify the sender and destroy
it
>> > immediately.
>> > >> > > > Statements of intent shall only become binding when
confirmed in
>> > hard
>> > >> > > copy
>> > >> > > > by an authorised signatory.
>> > >> > > >
>> > >> > > > Zaizi Ltd is registered in England and Wales with the
>> registration
>> > >> > number
>> > >> > > > 6440931. The Registered Office is Brook House, 229 Shepherds
>> Bush
>> > >> Road,
>> > >> > > > London W6 7AN.
>> > >> > >
>> > >> >
>> > >>
>> > >> --
>> > >>
>> > >> ------------------------------
>> > >> This message should be regarded as confidential. If you have received
>> > this
>> > >> email in error please notify the sender and destroy it immediately.
>> > >> Statements of intent shall only become binding when confirmed in hard
>> > copy
>> > >> by an authorised signatory.
>> > >>
>> > >> Zaizi Ltd is registered in England and Wales with the registration
>> > number
>> > >> 6440931. The Registered Office is Brook House, 229 Shepherds Bush
>> Road,
>> > >> London W6 7AN.
>> > >>
>> >
>>
> -- 
> ------------------------------
> This message should be regarded as confidential. If you have received this 
> email in error please notify the sender and destroy it immediately. 
> Statements of intent shall only become binding when confirmed in hard copy 
> by an authorised signatory.
> Zaizi Ltd is registered in England and Wales with the registration number 
> 6440931. The Registered Office is Brook House, 229 Shepherds Bush Road, 
> London W6 7AN. 
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message