manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dileepa Jayakody <djayak...@zaizi.com>
Subject Re: ManifoldCF transformation connector for Apache Stanbol
Date Mon, 07 Dec 2015 11:59:43 GMT
Hi Karl,

Thanks a lot for the pointer.

Stanbol doesn't update an existing document, it generates a new response
with requested enhancement details for the content enhansment request.
For example for a request like : "Paris is a city in France" following RDF
response [1] is given by Stanbol.

In the Stanbol connector, enhancement artifacts such as TextAnnotations
and EntityAnnotations are extracted from the RDF response, to generate the
entity abstractions and add them to the mcf repository document. Currently
in the Stanbol connector we have added these entity abstractions as JSON
strings to a multi-valued 'entities' field in the repository document and
we parse that JSON in the SolrWrapper output connector to index in separate
Solr cores (primary documents, linked entities and entity types with their
attributes).

Can we can have a primary repository document and create sub documents for
the extracted entities? Is it possible to generate sub documents for a
repo-document in a transformation connector?

Thanks.
Dileepa

[1] Sample Stanbol response

{
  "@context": {
    "dbp-ont": "http://dbpedia.org/ontology/",
    "dc": "http://purl.org/dc/terms/",
    "dc:created": {
      "@type": "xsd:dateTime"
    },
    "enhancer": "http://fise.iks-project.eu/ontology/",
    "enhancer:confidence": {
      "@type": "xsd:double"
    },
    "enhancer:end": {
      "@type": "xsd:int"
    },
    "enhancer:entity-reference": {
      "@type": "@id"
    },
    "enhancer:entity-type": {
      "@type": "@id"
    },
    "enhancer:extracted-from": {
      "@type": "@id"
    },
    "enhancer:start": {
      "@type": "xsd:int"
    },
    "entityhub": "http://stanbol.apache.org/ontology/entityhub/entityhub#",
    "foaf": "http://xmlns.com/foaf/0.1/",
    "foaf:depiction": {
      "@type": "@id"
    },
    "owl": "http://www.w3.org/2002/07/owl#",
    "rdfs": "http://www.w3.org/2000/01/rdf-schema#",
    "schema": "http://schema.org/",
    "xsd": "http://www.w3.org/2001/XMLSchema#"
  },
  "@graph": [
    {
      "@id": "http://dbpedia.org/resource/France",
      "@type": [
        "dbp-ont:Country",
        "dbp-ont:Place",
        "dbp-ont:PopulatedPlace",
        "http://www.opengis.net/gml/_Feature",
        "owl:Thing",
        "schema:Country",
        "schema:Place"
      ],
      "foaf:depiction": [
        "http://upload.wikimedia.org/wikipedia/commons/c/c3/Flag_of_France.svg",
        "http://upload.wikimedia.org/wikipedia/commons/thumb/c/c3/Flag_of_France.svg/200px-Flag_of_France.svg.png"
      ],
      "rdfs:comment": {
        "@language": "en",
        "@value": "France, officially the French Republic, is a
unitary semi-presidential republic in Western Europe with several
overseas territories and islands located on other continents and in
the Indian, Pacific, and Atlantic oceans. Metropolitan France extends
from the Mediterranean Sea to the English Channel and the North Sea,
and from the Rhine to the Atlantic Ocean. It is often referred to as
l’Hexagone because of the geometric shape of its territory."
      },
      "rdfs:label": [
        {
          "@language": "en",
          "@value": "France"
        },
        {
          "@language": "fr",
          "@value": "France"
        },
      ]
    },

    {
      "@id": "http://dbpedia.org/resource/Paris",
      "@type": [
        "dbp-ont:Place",
        "dbp-ont:PopulatedPlace",
        "dbp-ont:Settlement",
        "http://www.opengis.net/gml/_Feature",
        "owl:Thing",
        "schema:Place"
      ],
      "foaf:depiction": [
        "http://upload.wikimedia.org/wikipedia/commons/6/6e/Paris_-_Eiffelturm_und_Marsfeld2.jpg",
        "http://upload.wikimedia.org/wikipedia/commons/thumb/6/6e/Paris_-_Eiffelturm_und_Marsfeld2.jpg/200px-Paris_-_Eiffelturm_und_Marsfeld2.jpg"
      ],
      "geo:lat": 48.8567,
      "geo:long": 2.3508,
      "rdfs:comment": {
        "@language": "en",
        "@value": "Paris is the capital and largest city of France. It
is situated on the river Seine, in northern France, at the heart of
the Île-de-France region (or Paris Region, French: Région parisienne).
As of January 2008 the city of Paris, within its administrative limits
largely unchanged since 1860, has an estimated population of 2,211,297
and a metropolitan population of 12,089,098, and is one of the most
populated metropolitan areas in Europe."
      },
      "rdfs:label": [

        {
          "@language": "en",
          "@value": "Paris"
        },
        {
          "@language": "fr",
          "@value": "Paris"
        },
      ]
    },
   }
    {
      "@id": "urn:enhancement-8db13707-1ecd-b4df-90ad-52447c8f2c84",
      "@type": [
        "enhancer:Enhancement",
        "enhancer:TextAnnotation"
      ],
      "dc:created": "2015-12-07T11:22:07.740Z",
      "dc:creator":
"org.apache.stanbol.enhancer.engines.opennlp.impl.NamedEntityExtractionEnhancementEngine",
      "dc:type": "dbp-ont:Place",
      "enhancer:confidence": 0.6017613,
      "enhancer:end": 5,
      "enhancer:extracted-from":
"urn:content-item-sha1-c8ae372ed26679df14da13050dd432fd32c527e3",
      "enhancer:selected-text": {
        "@language": "en",
        "@value": "Paris"
      },
      "enhancer:selection-context": {
        "@language": "en",
        "@value": "Paris is in France"
      },
      "enhancer:start": 0
    },
    {
      "@id": "urn:enhancement-b2855552-0e46-62f5-cd33-9f84ab32e547",
      "@type": [
        "enhancer:Enhancement",
        "enhancer:EntityAnnotation"
      ],
      "dc:created": "2015-12-07T11:22:07.748Z",
      "dc:creator":
"org.apache.stanbol.enhancer.engines.entitytagging.impl.NamedEntityTaggingEngine",
      "dc:relation": "urn:enhancement-e9c9c187-2d69-2c1f-6552-e76111430d4a",
      "enhancer:confidence": 1.0,
      "enhancer:entity-label": {
        "@language": "en",
        "@value": "France"
      },
      "enhancer:entity-reference": "http://dbpedia.org/resource/France",
      "enhancer:entity-type": [
        "dbp-ont:Country",
        "dbp-ont:Place",
        "dbp-ont:PopulatedPlace",
        "schema:Country",
        "schema:Place",
        "http://www.opengis.net/gml/_Feature",
        "owl:Thing"
      ],
      "enhancer:extracted-from":
"urn:content-item-sha1-c8ae372ed26679df14da13050dd432fd32c527e3",
      "entityhub:site": "dbpedia"
    },
    {
      "@id": "urn:enhancement-c50474e4-ea0e-03ff-5db5-a25f4c8dae45",
      "@type": [
        "enhancer:Enhancement",
        "enhancer:EntityAnnotation"
      ],
      "dc:created": "2015-12-07T11:22:07.748Z",
      "dc:creator":
"org.apache.stanbol.enhancer.engines.entitytagging.impl.NamedEntityTaggingEngine",
      "dc:relation": "urn:enhancement-e9c9c187-2d69-2c1f-6552-e76111430d4a",
      "enhancer:confidence": 0.25715446,
      "enhancer:entity-label": {
        "@language": "en",
        "@value": "Vichy France"
      },
      "enhancer:entity-reference": "http://dbpedia.org/resource/Vichy_France",
      "enhancer:entity-type": [
        "dbp-ont:Country",
        "dbp-ont:Place",
        "dbp-ont:PopulatedPlace",
        "schema:Country",
        "schema:Place",
        "http://www.opengis.net/gml/_Feature",
        "owl:Thing"
      ],
      "enhancer:extracted-from":
"urn:content-item-sha1-c8ae372ed26679df14da13050dd432fd32c527e3",
      "entityhub:site": "dbpedia"
    },
    {
      "@id": "urn:enhancement-de07bc41-e4a1-f510-3f93-99ebfd8c39f4",
      "@type": [
        "enhancer:Enhancement",
        "enhancer:EntityAnnotation"
      ],
      "dc:created": "2015-12-07T11:22:07.748Z",
      "dc:creator":
"org.apache.stanbol.enhancer.engines.entitytagging.impl.NamedEntityTaggingEngine",
      "dc:relation": "urn:enhancement-8db13707-1ecd-b4df-90ad-52447c8f2c84",
      "enhancer:confidence": 0.1493264,
      "enhancer:entity-label": {
        "@language": "en",
        "@value": "Paris Commune"
      },
      "enhancer:entity-reference": "http://dbpedia.org/resource/Paris_Commune",
      "enhancer:entity-type": [
        "dbp-ont:Country",
        "dbp-ont:Place",
        "dbp-ont:PopulatedPlace",
        "schema:Country",
        "schema:Place",
        "owl:Thing"
      ],
      "enhancer:extracted-from":
"urn:content-item-sha1-c8ae372ed26679df14da13050dd432fd32c527e3",
      "entityhub:site": "dbpedia"
    },
    {
      "@id": "urn:enhancement-e9c9c187-2d69-2c1f-6552-e76111430d4a",
      "@type": [
        "enhancer:Enhancement",
        "enhancer:TextAnnotation"
      ],
      "dc:created": "2015-12-07T11:22:07.740Z",
      "dc:creator":
"org.apache.stanbol.enhancer.engines.opennlp.impl.NamedEntityExtractionEnhancementEngine",
      "dc:type": "dbp-ont:Place",
      "enhancer:confidence": 0.99354976,
      "enhancer:end": 18,
      "enhancer:extracted-from":
"urn:content-item-sha1-c8ae372ed26679df14da13050dd432fd32c527e3",
      "enhancer:selected-text": {
        "@language": "en",
        "@value": "France"
      },
      "enhancer:selection-context": {
        "@language": "en",
        "@value": "Paris is in France"
      },
      "enhancer:start": 12
    }
  ]
}






On Mon, Dec 7, 2015 at 4:23 PM, Karl Wright <daddywri@gmail.com> wrote:

> Hi Dileepa,
>
> Repository connectors have an abstraction that allows them to generate
> compound documents (where a document has a primary identifier, and there
> are subdocuments that share that primary identifier and have a secondary
> identifier).  This sounds a bit like what you are describing.  Does Stanbol
> work by decorating an existing document, or does it work by generating all
> content for a document?
>
> Karl
>
>
> On Mon, Dec 7, 2015 at 5:12 AM, Dileepa Jayakody <djayakody@zaizi.com>
> wrote:
>
> > Hi All,
> >
> >
> > While thanking you all for your input on Stanbol connector requirement, I
> > would like to continue with modifying the Stanbol connector to be
> > compatible with any output connector. If you guys can give some guidance
> on
> > how the entity metadata should be added to the repository document I can
> > modify the stanbol connector accordingly.
> >
> > From Rafa's comments, I gathered we can add the entity metadata to the
> > repo.doc as key value pairs.
> > However this idea is not yet clear to me. There could be 'N' number of
> > entities in a document and each of them will have some common attributes
> > such as name, id, type and specific attributes for particular entity
> type.
> > I'm not clear on how to maintain that structure of N number of entities
> > with their attributes in a repo.document as key value pairs and make them
> > LDPath compatible for retrieval in an output connector.
> >
> > @Rafa
> > If you can please elaborate on your suggestion it would be greatly
> helpful
> > to me.
> > All other suggestions are also welcome.
> >
> > Thanks,
> > Dileepa
> >
> >
> > On Fri, Nov 13, 2015 at 7:00 PM, Karl Wright <daddywri@gmail.com> wrote:
> >
> > > I, too, agree.  Somebody will need to turn this connector into one that
> > > plays by the rules.  It may be possible for someone on the team here to
> > do
> > > that, but it won't be me; I'm seriously overextended at the moment.  It
> > > would be best if someone who knew the connector well could do the
> > necessary
> > > work.
> > >
> > > Karl
> > >
> > >
> > > On Fri, Nov 13, 2015 at 5:45 AM, Rafa Haro <rharoapache@gmail.com>
> > wrote:
> > >
> > > > I must agree with Antonio. When I started to work on this I was
> > expecting
> > > > the connector to work by just extracting the entities and entities
> > > metadata
> > > > and put them as plain metadata of the documents, probably following
> > > LDPATH
> > > > queries configuration
> > > >
> > > >
> > > >
> > > >
> > > > This is probably ok for Sensefy but I don’t think this could be
> > suitable
> > > > to be included in the project. But this is only my opinion. Of
> course,
> > a
> > > > version of the connector that fully respect the ManifoldCF
> architecture
> > > > would be more than welcome in my opinion
> > > >
> > > > On Fri, Nov 13, 2015 at 11:38 AM, Antonio David Pérez Morales
> > > > <adperezmorales@gmail.com> wrote:
> > > >
> > > > > Hi
> > > > > The removal of the SolrWrapper is a must. It was a requirement for
> an
> > > > > internal project which has nothing to do here with a normal
> operation
> > > of
> > > > > Manifold, so forcing the users to use Solr does not fit the
> Manifold
> > > > > philosophy.
> > > > > In my opinion, at this moment, a Stanbol connector with such a big
> > > > > dependency which will not fit almost any use case is not very
> useful.
> > > > > You should think a way to convert Stanbol connector into a normal
> > > > > Transformation connector without assuming that a specific output
> > > > connector
> > > > > will be used.
> > > > > Regards
> > > > > 2015-11-13 11:20 GMT+01:00 Dileepa Jayakody <djayakody@zaizi.com>:
> > > > >> Hi guys,
> > > > >>
> > > > >> I have developed a Stanbol connector for MCF. You can check it
out
> > > from
> > > > our
> > > > >> github repo here:
> > > > >>
> > > > >>
> > > >
> > >
> >
> https://github.com/zaizi/sensefy-connectors/tree/master/transformation/mcf-stanbol-connector
> > > > >>
> > > > >> It requires the SolrWrapper output connector which indexes
> enhanced
> > > > >> documents, entities and entityTypes in separate Solr cores.
> > Basically
> > > it
> > > > >> requires 3 separate solr cores configured with a specific Solr
> > schema
> > > > for
> > > > >> primary documents, entities and entityTypes separately. This
was
> > done
> > > > for
> > > > >> our specific use-case.
> > > > >>
> > > > >> The SolrWrapper code is here :
> > > > >>
> > > > >>
> > > >
> > >
> >
> https://github.com/zaizi/sensefy-connectors/tree/master/output/mcf-solrwrapperconnector
> > > > >>
> > > > >> Perhaps we can discuss and remove the Stanbol connector's
> dependency
> > > > with
> > > > >> SolrWrapper and have it working with any output connector.
> > > > >> Please note that the Stanbol connector currently has a bug in
the
> UI
> > > > >> (editSpecification) which I'm working on at the moment. After
> fixing
> > > > that I
> > > > >> will update here. And also I will provide documentations for
> > > configuring
> > > > >> the connector.
> > > > >>
> > > > >> Thanks,
> > > > >> Dileepa
> > > > >>
> > > > >> On Thu, Jul 9, 2015 at 8:36 PM, Antonio David Pérez Morales
<
> > > > >> adperezmorales@gmail.com> wrote:
> > > > >>
> > > > >> > Hi Joshua
> > > > >> >
> > > > >> > It is not the list for that, but Marmotta is already integrated
> in
> > > > Apache
> > > > >> > Stanbol. You can take a look at this issue
> > > > >> > https://issues.apache.org/jira/browse/STANBOL-1165 .
> > > > >> >
> > > > >> > Anyway, as I said this is not the list for that, so let's
use
> the
> > > > proper
> > > > >> > list for these things.
> > > > >> >
> > > > >> > Regards
> > > > >> >
> > > > >> >
> > > > >> >
> > > > >> > 2015-07-09 15:29 GMT+02:00 Joshua Dunham <
> joshua.dunham@gmail.com
> > >:
> > > > >> >
> > > > >> > > Hey Dileepa,
> > > > >> > >
> > > > >> > >       In case you were interested, I pinged the list
a few
> days
> > > ago
> > > > >> > asking
> > > > >> > > for integration tips for Apache Marmotta.
> > > > >> > >
> > > > >> > > I got some great tips on how to do this which could
help you.
> > > Since
> > > > >> > > Marmotta is a drop in replacement for Clarezza on Stanbol
it
> may
> > > be
> > > > >> > easier
> > > > >> > > for you to take this way.
> > > > >> > >
> > > > >> > > I'm not a Java programmer but I'm bringing this problem
to the
> > > > >> > development
> > > > >> > > staff at my company for assistance. If you like the
Marmotta
> > > > approach
> > > > >> we
> > > > >> > > may gain more traction solving the same integration.
> > > > >> > >
> > > > >> > > I'm also integrating Marmotta with Stanbol so the effect
would
> > be
> > > > the
> > > > >> > same
> > > > >> > > except not using the Stanbol API for data import in
favor of
> > > > Marmotta.
> > > > >> > >
> > > > >> > > Best,
> > > > >> > >
> > > > >> > > -J
> > > > >> > >
> > > > >> > > > On Jul 9, 2015, at 1:03 AM, Dileepa Jayakody <
> > > djayakody@zaizi.com
> > > > >
> > > > >> > > wrote:
> > > > >> > > >
> > > > >> > > > Hi all,
> > > > >> > > >
> > > > >> > > > Thanks you for the feedback and offering your
help in this.
> > > > >> > > > Let me get back to you on where to start the code
base.
> > > > >> > > > As the first step, I would like to start by creating
a
> > > > architecture
> > > > >> > > diagram
> > > > >> > > > for the connector.
> > > > >> > > > I will send the diagram for your review soon.
> > > > >> > > >
> > > > >> > > > Thanks,
> > > > >> > > > Dileepa
> > > > >> > > >
> > > > >> > > > --
> > > > >> > > >
> > > > >> > > > ------------------------------
> > > > >> > > > This message should be regarded as confidential.
If you have
> > > > received
> > > > >> > > this
> > > > >> > > > email in error please notify the sender and destroy
it
> > > > immediately.
> > > > >> > > > Statements of intent shall only become binding
when
> confirmed
> > in
> > > > hard
> > > > >> > > copy
> > > > >> > > > by an authorised signatory.
> > > > >> > > >
> > > > >> > > > Zaizi Ltd is registered in England and Wales with
the
> > > registration
> > > > >> > number
> > > > >> > > > 6440931. The Registered Office is Brook House,
229 Shepherds
> > > Bush
> > > > >> Road,
> > > > >> > > > London W6 7AN.
> > > > >> > >
> > > > >> >
> > > > >>
> > > > >> --
> > > > >>
> > > > >> ------------------------------
> > > > >> This message should be regarded as confidential. If you have
> > received
> > > > this
> > > > >> email in error please notify the sender and destroy it
> immediately.
> > > > >> Statements of intent shall only become binding when confirmed
in
> > hard
> > > > copy
> > > > >> by an authorised signatory.
> > > > >>
> > > > >> Zaizi Ltd is registered in England and Wales with the registration
> > > > number
> > > > >> 6440931. The Registered Office is Brook House, 229 Shepherds
Bush
> > > Road,
> > > > >> London W6 7AN.
> > > > >>
> > > >
> > >
> >
> > --
> >
> > ------------------------------
> > This message should be regarded as confidential. If you have received
> this
> > email in error please notify the sender and destroy it immediately.
> > Statements of intent shall only become binding when confirmed in hard
> copy
> > by an authorised signatory.
> >
> > Zaizi Ltd is registered in England and Wales with the registration number
> > 6440931. The Registered Office is Brook House, 229 Shepherds Bush Road,
> > London W6 7AN.
> >
>

-- 

------------------------------
This message should be regarded as confidential. If you have received this 
email in error please notify the sender and destroy it immediately. 
Statements of intent shall only become binding when confirmed in hard copy 
by an authorised signatory.

Zaizi Ltd is registered in England and Wales with the registration number 
6440931. The Registered Office is Brook House, 229 Shepherds Bush Road, 
London W6 7AN. 

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message