manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: ManifoldCF transformation connector for Apache Stanbol
Date Mon, 07 Dec 2015 11:33:46 GMT
It makes sense to me, anyway. :-)
It sounds like Stanbol just has hierarchical attributes, rather than actual
documents.

Karl

On Mon, Dec 7, 2015 at 6:16 AM, Rafa Haro <rharo@apache.org> wrote:

> Hi Dileepa,
>
>
>
>
> As I explained to you before, with Solr (and probably this is also true
> with elastic search, although it allows you to index nested fields) you
> can't have nested objects or fields. Besides that, also within ManifoldCF
> the metadata is expressed as key, value pairs where values can be list of
> objects but nothing beyond that. So, there is not possible to work with
> complex structures as metadata, you must plain the stuff before.
>
>
>
>
> In a nutshell, it is not possible to maintain the relationships between
> entities and entities metadata. That doesn't mean that it is not
> interesting to index the semantic metadata information, even if you can
> relate them with a concrete entity. Indexing that information would enable
> a bunch of uses cases. So, the proposal would be to define LDPath fields by
> configuration at the transformation connector. With all the LDPath
> expressions you would build a LDPATH program that would pass to the Stanbol
> enhancer request. When you parse the response, you just need to go entity
> by entity taking the LDPath fields values returned and putting them as
> metadata using the name of the field as key and the returned value as value.
>
>
>
>
> Does make sense?
>
>
>
>
> Cheers,
>
> Rafa
>
> On Mon, Dec 7, 2015 at 11:17 AM, Dileepa Jayakody <djayakody@zaizi.com>
> wrote:
>
> > Hi All,
> > While thanking you all for your input on Stanbol connector requirement, I
> > would like to continue with modifying the Stanbol connector to be
> > compatible with any output connector. If you guys can give some guidance
> on
> > how the entity metadata should be added to the repository document I can
> > modify the stanbol connector accordingly.
> > From Rafa's comments, I gathered we can add the entity metadata to the
> > repo.doc as key value pairs.
> > However this idea is not yet clear to me. There could be 'N' number of
> > entities in a document and each of them will have some common attributes
> > such as name, id, type and specific attributes for particular entity
> type.
> > I'm not clear on how to maintain that structure of N number of entities
> > with their attributes in a repo.document as key value pairs and make them
> > LDPath compatible for retrieval in an output connector.
> > @Rafa
> > If you can please elaborate on your suggestion it would be greatly
> helpful
> > to me.
> > All other suggestions are also welcome.
> > Thanks,
> > Dileepa
> > On Fri, Nov 13, 2015 at 7:00 PM, Karl Wright <daddywri@gmail.com> wrote:
> >> I, too, agree.  Somebody will need to turn this connector into one that
> >> plays by the rules.  It may be possible for someone on the team here to
> do
> >> that, but it won't be me; I'm seriously overextended at the moment.  It
> >> would be best if someone who knew the connector well could do the
> necessary
> >> work.
> >>
> >> Karl
> >>
> >>
> >> On Fri, Nov 13, 2015 at 5:45 AM, Rafa Haro <rharoapache@gmail.com>
> wrote:
> >>
> >> > I must agree with Antonio. When I started to work on this I was
> expecting
> >> > the connector to work by just extracting the entities and entities
> >> metadata
> >> > and put them as plain metadata of the documents, probably following
> >> LDPATH
> >> > queries configuration
> >> >
> >> >
> >> >
> >> >
> >> > This is probably ok for Sensefy but I don’t think this could be
> suitable
> >> > to be included in the project. But this is only my opinion. Of
> course, a
> >> > version of the connector that fully respect the ManifoldCF
> architecture
> >> > would be more than welcome in my opinion
> >> >
> >> > On Fri, Nov 13, 2015 at 11:38 AM, Antonio David Pérez Morales
> >> > <adperezmorales@gmail.com> wrote:
> >> >
> >> > > Hi
> >> > > The removal of the SolrWrapper is a must. It was a requirement for
> an
> >> > > internal project which has nothing to do here with a normal
> operation
> >> of
> >> > > Manifold, so forcing the users to use Solr does not fit the Manifold
> >> > > philosophy.
> >> > > In my opinion, at this moment, a Stanbol connector with such a big
> >> > > dependency which will not fit almost any use case is not very
> useful.
> >> > > You should think a way to convert Stanbol connector into a normal
> >> > > Transformation connector without assuming that a specific output
> >> > connector
> >> > > will be used.
> >> > > Regards
> >> > > 2015-11-13 11:20 GMT+01:00 Dileepa Jayakody <djayakody@zaizi.com>:
> >> > >> Hi guys,
> >> > >>
> >> > >> I have developed a Stanbol connector for MCF. You can check it
out
> >> from
> >> > our
> >> > >> github repo here:
> >> > >>
> >> > >>
> >> >
> >>
> https://github.com/zaizi/sensefy-connectors/tree/master/transformation/mcf-stanbol-connector
> >> > >>
> >> > >> It requires the SolrWrapper output connector which indexes enhanced
> >> > >> documents, entities and entityTypes in separate Solr cores.
> Basically
> >> it
> >> > >> requires 3 separate solr cores configured with a specific Solr
> schema
> >> > for
> >> > >> primary documents, entities and entityTypes separately. This was
> done
> >> > for
> >> > >> our specific use-case.
> >> > >>
> >> > >> The SolrWrapper code is here :
> >> > >>
> >> > >>
> >> >
> >>
> https://github.com/zaizi/sensefy-connectors/tree/master/output/mcf-solrwrapperconnector
> >> > >>
> >> > >> Perhaps we can discuss and remove the Stanbol connector's
> dependency
> >> > with
> >> > >> SolrWrapper and have it working with any output connector.
> >> > >> Please note that the Stanbol connector currently has a bug in
the
> UI
> >> > >> (editSpecification) which I'm working on at the moment. After
> fixing
> >> > that I
> >> > >> will update here. And also I will provide documentations for
> >> configuring
> >> > >> the connector.
> >> > >>
> >> > >> Thanks,
> >> > >> Dileepa
> >> > >>
> >> > >> On Thu, Jul 9, 2015 at 8:36 PM, Antonio David Pérez Morales <
> >> > >> adperezmorales@gmail.com> wrote:
> >> > >>
> >> > >> > Hi Joshua
> >> > >> >
> >> > >> > It is not the list for that, but Marmotta is already integrated
> in
> >> > Apache
> >> > >> > Stanbol. You can take a look at this issue
> >> > >> > https://issues.apache.org/jira/browse/STANBOL-1165 .
> >> > >> >
> >> > >> > Anyway, as I said this is not the list for that, so let's
use the
> >> > proper
> >> > >> > list for these things.
> >> > >> >
> >> > >> > Regards
> >> > >> >
> >> > >> >
> >> > >> >
> >> > >> > 2015-07-09 15:29 GMT+02:00 Joshua Dunham <
> joshua.dunham@gmail.com>:
> >> > >> >
> >> > >> > > Hey Dileepa,
> >> > >> > >
> >> > >> > >       In case you were interested, I pinged the list
a few days
> >> ago
> >> > >> > asking
> >> > >> > > for integration tips for Apache Marmotta.
> >> > >> > >
> >> > >> > > I got some great tips on how to do this which could
help you.
> >> Since
> >> > >> > > Marmotta is a drop in replacement for Clarezza on Stanbol
it
> may
> >> be
> >> > >> > easier
> >> > >> > > for you to take this way.
> >> > >> > >
> >> > >> > > I'm not a Java programmer but I'm bringing this problem
to the
> >> > >> > development
> >> > >> > > staff at my company for assistance. If you like the
Marmotta
> >> > approach
> >> > >> we
> >> > >> > > may gain more traction solving the same integration.
> >> > >> > >
> >> > >> > > I'm also integrating Marmotta with Stanbol so the effect
would
> be
> >> > the
> >> > >> > same
> >> > >> > > except not using the Stanbol API for data import in
favor of
> >> > Marmotta.
> >> > >> > >
> >> > >> > > Best,
> >> > >> > >
> >> > >> > > -J
> >> > >> > >
> >> > >> > > > On Jul 9, 2015, at 1:03 AM, Dileepa Jayakody <
> >> djayakody@zaizi.com
> >> > >
> >> > >> > > wrote:
> >> > >> > > >
> >> > >> > > > Hi all,
> >> > >> > > >
> >> > >> > > > Thanks you for the feedback and offering your help
in this.
> >> > >> > > > Let me get back to you on where to start the code
base.
> >> > >> > > > As the first step, I would like to start by creating
a
> >> > architecture
> >> > >> > > diagram
> >> > >> > > > for the connector.
> >> > >> > > > I will send the diagram for your review soon.
> >> > >> > > >
> >> > >> > > > Thanks,
> >> > >> > > > Dileepa
> >> > >> > > >
> >> > >> > > > --
> >> > >> > > >
> >> > >> > > > ------------------------------
> >> > >> > > > This message should be regarded as confidential.
If you have
> >> > received
> >> > >> > > this
> >> > >> > > > email in error please notify the sender and destroy
it
> >> > immediately.
> >> > >> > > > Statements of intent shall only become binding
when
> confirmed in
> >> > hard
> >> > >> > > copy
> >> > >> > > > by an authorised signatory.
> >> > >> > > >
> >> > >> > > > Zaizi Ltd is registered in England and Wales with
the
> >> registration
> >> > >> > number
> >> > >> > > > 6440931. The Registered Office is Brook House,
229 Shepherds
> >> Bush
> >> > >> Road,
> >> > >> > > > London W6 7AN.
> >> > >> > >
> >> > >> >
> >> > >>
> >> > >> --
> >> > >>
> >> > >> ------------------------------
> >> > >> This message should be regarded as confidential. If you have
> received
> >> > this
> >> > >> email in error please notify the sender and destroy it immediately.
> >> > >> Statements of intent shall only become binding when confirmed
in
> hard
> >> > copy
> >> > >> by an authorised signatory.
> >> > >>
> >> > >> Zaizi Ltd is registered in England and Wales with the registration
> >> > number
> >> > >> 6440931. The Registered Office is Brook House, 229 Shepherds Bush
> >> Road,
> >> > >> London W6 7AN.
> >> > >>
> >> >
> >>
> > --
> > ------------------------------
> > This message should be regarded as confidential. If you have received
> this
> > email in error please notify the sender and destroy it immediately.
> > Statements of intent shall only become binding when confirmed in hard
> copy
> > by an authorised signatory.
> > Zaizi Ltd is registered in England and Wales with the registration number
> > 6440931. The Registered Office is Brook House, 229 Shepherds Bush Road,
> > London W6 7AN.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message