manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: ManifoldCF transformation connector for Apache Stanbol
Date Mon, 07 Dec 2015 12:59:11 GMT
Hi Dileepa,

You cannot create sub-documents in a transformation connector.  And adding
that capability to the framework is not possible; we would be missing key
bookkeeping logic if that was allowed.

Karl


On Mon, Dec 7, 2015 at 6:59 AM, Dileepa Jayakody <djayakody@zaizi.com>
wrote:

> Hi Karl,
>
> Thanks a lot for the pointer.
>
> Stanbol doesn't update an existing document, it generates a new response
> with requested enhancement details for the content enhansment request.
> For example for a request like : "Paris is a city in France" following RDF
> response [1] is given by Stanbol.
>
> In the Stanbol connector, enhancement artifacts such as TextAnnotations
> and EntityAnnotations are extracted from the RDF response, to generate the
> entity abstractions and add them to the mcf repository document. Currently
> in the Stanbol connector we have added these entity abstractions as JSON
> strings to a multi-valued 'entities' field in the repository document and
> we parse that JSON in the SolrWrapper output connector to index in separate
> Solr cores (primary documents, linked entities and entity types with their
> attributes).
>
> Can we can have a primary repository document and create sub documents for
> the extracted entities? Is it possible to generate sub documents for a
> repo-document in a transformation connector?
>
> Thanks.
> Dileepa
>
> [1] Sample Stanbol response
>
> {
>   "@context": {
>     "dbp-ont": "http://dbpedia.org/ontology/",
>     "dc": "http://purl.org/dc/terms/",
>     "dc:created": {
>       "@type": "xsd:dateTime"
>     },
>     "enhancer": "http://fise.iks-project.eu/ontology/",
>     "enhancer:confidence": {
>       "@type": "xsd:double"
>     },
>     "enhancer:end": {
>       "@type": "xsd:int"
>     },
>     "enhancer:entity-reference": {
>       "@type": "@id"
>     },
>     "enhancer:entity-type": {
>       "@type": "@id"
>     },
>     "enhancer:extracted-from": {
>       "@type": "@id"
>     },
>     "enhancer:start": {
>       "@type": "xsd:int"
>     },
>     "entityhub": "http://stanbol.apache.org/ontology/entityhub/entityhub#
> ",
>     "foaf": "http://xmlns.com/foaf/0.1/",
>     "foaf:depiction": {
>       "@type": "@id"
>     },
>     "owl": "http://www.w3.org/2002/07/owl#",
>     "rdfs": "http://www.w3.org/2000/01/rdf-schema#",
>     "schema": "http://schema.org/",
>     "xsd": "http://www.w3.org/2001/XMLSchema#"
>   },
>   "@graph": [
>     {
>       "@id": "http://dbpedia.org/resource/France",
>       "@type": [
>         "dbp-ont:Country",
>         "dbp-ont:Place",
>         "dbp-ont:PopulatedPlace",
>         "http://www.opengis.net/gml/_Feature",
>         "owl:Thing",
>         "schema:Country",
>         "schema:Place"
>       ],
>       "foaf:depiction": [
>         "
> http://upload.wikimedia.org/wikipedia/commons/c/c3/Flag_of_France.svg",
>         "
> http://upload.wikimedia.org/wikipedia/commons/thumb/c/c3/Flag_of_France.svg/200px-Flag_of_France.svg.png
> "
>       ],
>       "rdfs:comment": {
>         "@language": "en",
>         "@value": "France, officially the French Republic, is a
> unitary semi-presidential republic in Western Europe with several
> overseas territories and islands located on other continents and in
> the Indian, Pacific, and Atlantic oceans. Metropolitan France extends
> from the Mediterranean Sea to the English Channel and the North Sea,
> and from the Rhine to the Atlantic Ocean. It is often referred to as
> l’Hexagone because of the geometric shape of its territory."
>       },
>       "rdfs:label": [
>         {
>           "@language": "en",
>           "@value": "France"
>         },
>         {
>           "@language": "fr",
>           "@value": "France"
>         },
>       ]
>     },
>
>     {
>       "@id": "http://dbpedia.org/resource/Paris",
>       "@type": [
>         "dbp-ont:Place",
>         "dbp-ont:PopulatedPlace",
>         "dbp-ont:Settlement",
>         "http://www.opengis.net/gml/_Feature",
>         "owl:Thing",
>         "schema:Place"
>       ],
>       "foaf:depiction": [
>         "
> http://upload.wikimedia.org/wikipedia/commons/6/6e/Paris_-_Eiffelturm_und_Marsfeld2.jpg
> ",
>         "
> http://upload.wikimedia.org/wikipedia/commons/thumb/6/6e/Paris_-_Eiffelturm_und_Marsfeld2.jpg/200px-Paris_-_Eiffelturm_und_Marsfeld2.jpg
> "
>       ],
>       "geo:lat": 48.8567,
>       "geo:long": 2.3508,
>       "rdfs:comment": {
>         "@language": "en",
>         "@value": "Paris is the capital and largest city of France. It
> is situated on the river Seine, in northern France, at the heart of
> the Île-de-France region (or Paris Region, French: Région parisienne).
> As of January 2008 the city of Paris, within its administrative limits
> largely unchanged since 1860, has an estimated population of 2,211,297
> and a metropolitan population of 12,089,098, and is one of the most
> populated metropolitan areas in Europe."
>       },
>       "rdfs:label": [
>
>         {
>           "@language": "en",
>           "@value": "Paris"
>         },
>         {
>           "@language": "fr",
>           "@value": "Paris"
>         },
>       ]
>     },
>    }
>     {
>       "@id": "urn:enhancement-8db13707-1ecd-b4df-90ad-52447c8f2c84",
>       "@type": [
>         "enhancer:Enhancement",
>         "enhancer:TextAnnotation"
>       ],
>       "dc:created": "2015-12-07T11:22:07.740Z",
>       "dc:creator":
>
> "org.apache.stanbol.enhancer.engines.opennlp.impl.NamedEntityExtractionEnhancementEngine",
>       "dc:type": "dbp-ont:Place",
>       "enhancer:confidence": 0.6017613,
>       "enhancer:end": 5,
>       "enhancer:extracted-from":
> "urn:content-item-sha1-c8ae372ed26679df14da13050dd432fd32c527e3",
>       "enhancer:selected-text": {
>         "@language": "en",
>         "@value": "Paris"
>       },
>       "enhancer:selection-context": {
>         "@language": "en",
>         "@value": "Paris is in France"
>       },
>       "enhancer:start": 0
>     },
>     {
>       "@id": "urn:enhancement-b2855552-0e46-62f5-cd33-9f84ab32e547",
>       "@type": [
>         "enhancer:Enhancement",
>         "enhancer:EntityAnnotation"
>       ],
>       "dc:created": "2015-12-07T11:22:07.748Z",
>       "dc:creator":
>
> "org.apache.stanbol.enhancer.engines.entitytagging.impl.NamedEntityTaggingEngine",
>       "dc:relation":
> "urn:enhancement-e9c9c187-2d69-2c1f-6552-e76111430d4a",
>       "enhancer:confidence": 1.0,
>       "enhancer:entity-label": {
>         "@language": "en",
>         "@value": "France"
>       },
>       "enhancer:entity-reference": "http://dbpedia.org/resource/France",
>       "enhancer:entity-type": [
>         "dbp-ont:Country",
>         "dbp-ont:Place",
>         "dbp-ont:PopulatedPlace",
>         "schema:Country",
>         "schema:Place",
>         "http://www.opengis.net/gml/_Feature",
>         "owl:Thing"
>       ],
>       "enhancer:extracted-from":
> "urn:content-item-sha1-c8ae372ed26679df14da13050dd432fd32c527e3",
>       "entityhub:site": "dbpedia"
>     },
>     {
>       "@id": "urn:enhancement-c50474e4-ea0e-03ff-5db5-a25f4c8dae45",
>       "@type": [
>         "enhancer:Enhancement",
>         "enhancer:EntityAnnotation"
>       ],
>       "dc:created": "2015-12-07T11:22:07.748Z",
>       "dc:creator":
>
> "org.apache.stanbol.enhancer.engines.entitytagging.impl.NamedEntityTaggingEngine",
>       "dc:relation":
> "urn:enhancement-e9c9c187-2d69-2c1f-6552-e76111430d4a",
>       "enhancer:confidence": 0.25715446,
>       "enhancer:entity-label": {
>         "@language": "en",
>         "@value": "Vichy France"
>       },
>       "enhancer:entity-reference": "
> http://dbpedia.org/resource/Vichy_France",
>       "enhancer:entity-type": [
>         "dbp-ont:Country",
>         "dbp-ont:Place",
>         "dbp-ont:PopulatedPlace",
>         "schema:Country",
>         "schema:Place",
>         "http://www.opengis.net/gml/_Feature",
>         "owl:Thing"
>       ],
>       "enhancer:extracted-from":
> "urn:content-item-sha1-c8ae372ed26679df14da13050dd432fd32c527e3",
>       "entityhub:site": "dbpedia"
>     },
>     {
>       "@id": "urn:enhancement-de07bc41-e4a1-f510-3f93-99ebfd8c39f4",
>       "@type": [
>         "enhancer:Enhancement",
>         "enhancer:EntityAnnotation"
>       ],
>       "dc:created": "2015-12-07T11:22:07.748Z",
>       "dc:creator":
>
> "org.apache.stanbol.enhancer.engines.entitytagging.impl.NamedEntityTaggingEngine",
>       "dc:relation":
> "urn:enhancement-8db13707-1ecd-b4df-90ad-52447c8f2c84",
>       "enhancer:confidence": 0.1493264,
>       "enhancer:entity-label": {
>         "@language": "en",
>         "@value": "Paris Commune"
>       },
>       "enhancer:entity-reference": "
> http://dbpedia.org/resource/Paris_Commune",
>       "enhancer:entity-type": [
>         "dbp-ont:Country",
>         "dbp-ont:Place",
>         "dbp-ont:PopulatedPlace",
>         "schema:Country",
>         "schema:Place",
>         "owl:Thing"
>       ],
>       "enhancer:extracted-from":
> "urn:content-item-sha1-c8ae372ed26679df14da13050dd432fd32c527e3",
>       "entityhub:site": "dbpedia"
>     },
>     {
>       "@id": "urn:enhancement-e9c9c187-2d69-2c1f-6552-e76111430d4a",
>       "@type": [
>         "enhancer:Enhancement",
>         "enhancer:TextAnnotation"
>       ],
>       "dc:created": "2015-12-07T11:22:07.740Z",
>       "dc:creator":
>
> "org.apache.stanbol.enhancer.engines.opennlp.impl.NamedEntityExtractionEnhancementEngine",
>       "dc:type": "dbp-ont:Place",
>       "enhancer:confidence": 0.99354976,
>       "enhancer:end": 18,
>       "enhancer:extracted-from":
> "urn:content-item-sha1-c8ae372ed26679df14da13050dd432fd32c527e3",
>       "enhancer:selected-text": {
>         "@language": "en",
>         "@value": "France"
>       },
>       "enhancer:selection-context": {
>         "@language": "en",
>         "@value": "Paris is in France"
>       },
>       "enhancer:start": 12
>     }
>   ]
> }
>
>
>
>
>
>
> On Mon, Dec 7, 2015 at 4:23 PM, Karl Wright <daddywri@gmail.com> wrote:
>
> > Hi Dileepa,
> >
> > Repository connectors have an abstraction that allows them to generate
> > compound documents (where a document has a primary identifier, and there
> > are subdocuments that share that primary identifier and have a secondary
> > identifier).  This sounds a bit like what you are describing.  Does
> Stanbol
> > work by decorating an existing document, or does it work by generating
> all
> > content for a document?
> >
> > Karl
> >
> >
> > On Mon, Dec 7, 2015 at 5:12 AM, Dileepa Jayakody <djayakody@zaizi.com>
> > wrote:
> >
> > > Hi All,
> > >
> > >
> > > While thanking you all for your input on Stanbol connector
> requirement, I
> > > would like to continue with modifying the Stanbol connector to be
> > > compatible with any output connector. If you guys can give some
> guidance
> > on
> > > how the entity metadata should be added to the repository document I
> can
> > > modify the stanbol connector accordingly.
> > >
> > > From Rafa's comments, I gathered we can add the entity metadata to the
> > > repo.doc as key value pairs.
> > > However this idea is not yet clear to me. There could be 'N' number of
> > > entities in a document and each of them will have some common
> attributes
> > > such as name, id, type and specific attributes for particular entity
> > type.
> > > I'm not clear on how to maintain that structure of N number of entities
> > > with their attributes in a repo.document as key value pairs and make
> them
> > > LDPath compatible for retrieval in an output connector.
> > >
> > > @Rafa
> > > If you can please elaborate on your suggestion it would be greatly
> > helpful
> > > to me.
> > > All other suggestions are also welcome.
> > >
> > > Thanks,
> > > Dileepa
> > >
> > >
> > > On Fri, Nov 13, 2015 at 7:00 PM, Karl Wright <daddywri@gmail.com>
> wrote:
> > >
> > > > I, too, agree.  Somebody will need to turn this connector into one
> that
> > > > plays by the rules.  It may be possible for someone on the team here
> to
> > > do
> > > > that, but it won't be me; I'm seriously overextended at the moment.
> It
> > > > would be best if someone who knew the connector well could do the
> > > necessary
> > > > work.
> > > >
> > > > Karl
> > > >
> > > >
> > > > On Fri, Nov 13, 2015 at 5:45 AM, Rafa Haro <rharoapache@gmail.com>
> > > wrote:
> > > >
> > > > > I must agree with Antonio. When I started to work on this I was
> > > expecting
> > > > > the connector to work by just extracting the entities and entities
> > > > metadata
> > > > > and put them as plain metadata of the documents, probably following
> > > > LDPATH
> > > > > queries configuration
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > This is probably ok for Sensefy but I don’t think this could be
> > > suitable
> > > > > to be included in the project. But this is only my opinion. Of
> > course,
> > > a
> > > > > version of the connector that fully respect the ManifoldCF
> > architecture
> > > > > would be more than welcome in my opinion
> > > > >
> > > > > On Fri, Nov 13, 2015 at 11:38 AM, Antonio David Pérez Morales
> > > > > <adperezmorales@gmail.com> wrote:
> > > > >
> > > > > > Hi
> > > > > > The removal of the SolrWrapper is a must. It was a requirement
> for
> > an
> > > > > > internal project which has nothing to do here with a normal
> > operation
> > > > of
> > > > > > Manifold, so forcing the users to use Solr does not fit the
> > Manifold
> > > > > > philosophy.
> > > > > > In my opinion, at this moment, a Stanbol connector with such
a
> big
> > > > > > dependency which will not fit almost any use case is not very
> > useful.
> > > > > > You should think a way to convert Stanbol connector into a normal
> > > > > > Transformation connector without assuming that a specific output
> > > > > connector
> > > > > > will be used.
> > > > > > Regards
> > > > > > 2015-11-13 11:20 GMT+01:00 Dileepa Jayakody <djayakody@zaizi.com
> >:
> > > > > >> Hi guys,
> > > > > >>
> > > > > >> I have developed a Stanbol connector for MCF. You can check
it
> out
> > > > from
> > > > > our
> > > > > >> github repo here:
> > > > > >>
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://github.com/zaizi/sensefy-connectors/tree/master/transformation/mcf-stanbol-connector
> > > > > >>
> > > > > >> It requires the SolrWrapper output connector which indexes
> > enhanced
> > > > > >> documents, entities and entityTypes in separate Solr cores.
> > > Basically
> > > > it
> > > > > >> requires 3 separate solr cores configured with a specific
Solr
> > > schema
> > > > > for
> > > > > >> primary documents, entities and entityTypes separately.
This was
> > > done
> > > > > for
> > > > > >> our specific use-case.
> > > > > >>
> > > > > >> The SolrWrapper code is here :
> > > > > >>
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://github.com/zaizi/sensefy-connectors/tree/master/output/mcf-solrwrapperconnector
> > > > > >>
> > > > > >> Perhaps we can discuss and remove the Stanbol connector's
> > dependency
> > > > > with
> > > > > >> SolrWrapper and have it working with any output connector.
> > > > > >> Please note that the Stanbol connector currently has a bug
in
> the
> > UI
> > > > > >> (editSpecification) which I'm working on at the moment.
After
> > fixing
> > > > > that I
> > > > > >> will update here. And also I will provide documentations
for
> > > > configuring
> > > > > >> the connector.
> > > > > >>
> > > > > >> Thanks,
> > > > > >> Dileepa
> > > > > >>
> > > > > >> On Thu, Jul 9, 2015 at 8:36 PM, Antonio David Pérez Morales
<
> > > > > >> adperezmorales@gmail.com> wrote:
> > > > > >>
> > > > > >> > Hi Joshua
> > > > > >> >
> > > > > >> > It is not the list for that, but Marmotta is already
> integrated
> > in
> > > > > Apache
> > > > > >> > Stanbol. You can take a look at this issue
> > > > > >> > https://issues.apache.org/jira/browse/STANBOL-1165
.
> > > > > >> >
> > > > > >> > Anyway, as I said this is not the list for that, so
let's use
> > the
> > > > > proper
> > > > > >> > list for these things.
> > > > > >> >
> > > > > >> > Regards
> > > > > >> >
> > > > > >> >
> > > > > >> >
> > > > > >> > 2015-07-09 15:29 GMT+02:00 Joshua Dunham <
> > joshua.dunham@gmail.com
> > > >:
> > > > > >> >
> > > > > >> > > Hey Dileepa,
> > > > > >> > >
> > > > > >> > >       In case you were interested, I pinged the
list a few
> > days
> > > > ago
> > > > > >> > asking
> > > > > >> > > for integration tips for Apache Marmotta.
> > > > > >> > >
> > > > > >> > > I got some great tips on how to do this which
could help
> you.
> > > > Since
> > > > > >> > > Marmotta is a drop in replacement for Clarezza
on Stanbol it
> > may
> > > > be
> > > > > >> > easier
> > > > > >> > > for you to take this way.
> > > > > >> > >
> > > > > >> > > I'm not a Java programmer but I'm bringing this
problem to
> the
> > > > > >> > development
> > > > > >> > > staff at my company for assistance. If you like
the Marmotta
> > > > > approach
> > > > > >> we
> > > > > >> > > may gain more traction solving the same integration.
> > > > > >> > >
> > > > > >> > > I'm also integrating Marmotta with Stanbol so
the effect
> would
> > > be
> > > > > the
> > > > > >> > same
> > > > > >> > > except not using the Stanbol API for data import
in favor of
> > > > > Marmotta.
> > > > > >> > >
> > > > > >> > > Best,
> > > > > >> > >
> > > > > >> > > -J
> > > > > >> > >
> > > > > >> > > > On Jul 9, 2015, at 1:03 AM, Dileepa Jayakody
<
> > > > djayakody@zaizi.com
> > > > > >
> > > > > >> > > wrote:
> > > > > >> > > >
> > > > > >> > > > Hi all,
> > > > > >> > > >
> > > > > >> > > > Thanks you for the feedback and offering
your help in
> this.
> > > > > >> > > > Let me get back to you on where to start
the code base.
> > > > > >> > > > As the first step, I would like to start
by creating a
> > > > > architecture
> > > > > >> > > diagram
> > > > > >> > > > for the connector.
> > > > > >> > > > I will send the diagram for your review soon.
> > > > > >> > > >
> > > > > >> > > > Thanks,
> > > > > >> > > > Dileepa
> > > > > >> > > >
> > > > > >> > > > --
> > > > > >> > > >
> > > > > >> > > > ------------------------------
> > > > > >> > > > This message should be regarded as confidential.
If you
> have
> > > > > received
> > > > > >> > > this
> > > > > >> > > > email in error please notify the sender and
destroy it
> > > > > immediately.
> > > > > >> > > > Statements of intent shall only become binding
when
> > confirmed
> > > in
> > > > > hard
> > > > > >> > > copy
> > > > > >> > > > by an authorised signatory.
> > > > > >> > > >
> > > > > >> > > > Zaizi Ltd is registered in England and Wales
with the
> > > > registration
> > > > > >> > number
> > > > > >> > > > 6440931. The Registered Office is Brook House,
229
> Shepherds
> > > > Bush
> > > > > >> Road,
> > > > > >> > > > London W6 7AN.
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > > >> --
> > > > > >>
> > > > > >> ------------------------------
> > > > > >> This message should be regarded as confidential. If you
have
> > > received
> > > > > this
> > > > > >> email in error please notify the sender and destroy it
> > immediately.
> > > > > >> Statements of intent shall only become binding when confirmed
in
> > > hard
> > > > > copy
> > > > > >> by an authorised signatory.
> > > > > >>
> > > > > >> Zaizi Ltd is registered in England and Wales with the
> registration
> > > > > number
> > > > > >> 6440931. The Registered Office is Brook House, 229 Shepherds
> Bush
> > > > Road,
> > > > > >> London W6 7AN.
> > > > > >>
> > > > >
> > > >
> > >
> > > --
> > >
> > > ------------------------------
> > > This message should be regarded as confidential. If you have received
> > this
> > > email in error please notify the sender and destroy it immediately.
> > > Statements of intent shall only become binding when confirmed in hard
> > copy
> > > by an authorised signatory.
> > >
> > > Zaizi Ltd is registered in England and Wales with the registration
> number
> > > 6440931. The Registered Office is Brook House, 229 Shepherds Bush Road,
> > > London W6 7AN.
> > >
> >
>
> --
>
> ------------------------------
> This message should be regarded as confidential. If you have received this
> email in error please notify the sender and destroy it immediately.
> Statements of intent shall only become binding when confirmed in hard copy
> by an authorised signatory.
>
> Zaizi Ltd is registered in England and Wales with the registration number
> 6440931. The Registered Office is Brook House, 229 Shepherds Bush Road,
> London W6 7AN.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message