manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dileepa Jayakody <djayak...@zaizi.com>
Subject Re: ManifoldCF transformation connector for Apache Stanbol
Date Sat, 12 Dec 2015 19:35:47 GMT
Hi Karl,

Yes, I will improve the code with Rafa's reviews and then we can import it
to mcf code base.

Thanks
Dileepa

On Sat, Dec 12, 2015 at 5:26 PM, Karl Wright <daddywri@gmail.com> wrote:

> Ok, it seems premature for me to try to import this from Github today, so
> I'll wait until the dust settles a bit further first.
>
> Karl
>
>
> On Fri, Dec 11, 2015 at 1:45 PM, Dileepa Jayakody <djayakody@zaizi.com>
> wrote:
>
> > Thanks a lot Rafa for pointing that out. big miss as  I didn't test the
> > LDPath configuration part yet. More improvements to be done.
> > I will do the required mprovements as pointed out.
> >
> > Regards,
> > Dileepa
> >
> >
> > On Fri, Dec 11, 2015 at 8:42 PM, Rafa Haro <rharo@apache.org> wrote:
> >
> > > Hi Dileepa,
> > >
> > > The problem is not in that part on the code, it is rather on this part:
> > >
> > > if (entity != null) { Collection<String> properties = entity.
> > > getProperties(); for (String property : properties) { String
> > > targetFieldName = derefFields.get(property); Set<String> propValues =
> > > entityPropertyMap.get(targetFieldName); if (propValues == null) {
> > > propValues = new HashSet<String>(); } Collection<String>
> > entityPropValues =
> > > entity.getPropertyValues(property);
> propValues.addAll(entityPropValues);
> > > entityPropertyMap.put(targetFieldName, propValues); } }
> > > You are collecting from the EnhancementStructure response just only the
> > > configured dereferenced fields and LDPath fields are ignored. Also,
> there
> > > is a potential bug in that code if there is no dereferencing field
> > > configured for a certain entity property here:
> > >
> > > String targetFieldName = derefFields.get(property);
> > >
> > > targetFieldName would be Null then. Instead of trying to index every
> > > property, you should just collect the configured ones by the user (or
> at
> > > least, if the user wants all of them, provide a configuration option
> for
> > > that).
> > >
> > > Anyway, going back to LDPath issue, please take into account that when
> > you
> > > define a field you must use a custom Namespace and Prefix for later
> being
> > > able to retrieve that property from the entity. If you don't do that,
> > > Stanbol will provide a random namespace for that property. Check this
> > > example from RedLink SDK:
> > >
> > >
> > >
> >
> https://github.com/redlink-gmbh/redlink-java-sdk/blob/master/src/test/java/io/redlink/sdk/AnalysisTest.java#L423-443
> > >
> > > Hope that helps
> > >
> > > On Fri, Dec 11, 2015 at 3:57 PM Karl Wright <daddywri@gmail.com>
> wrote:
> > >
> > > > The next step would be to pull this code into an svn branch.  This is
> > > > something I can tackled after the 2.3 release candidate is put
> > together.
> > > >
> > > > Thanks,
> > > > Karl
> > > >
> > > >
> > > > On Fri, Dec 11, 2015 at 9:07 AM, Dileepa Jayakody <
> djayakody@zaizi.com
> > >
> > > > wrote:
> > > >
> > > > > Hi Rafa,
> > > > >
> > > > > Thanks for reviewing my code and for your feedback. Please see my
> > > > comments
> > > > > inline below.
> > > > >
> > > > >
> > > > > On Fri, Dec 11, 2015 at 6:51 PM, Rafa Haro <rharo@apache.org>
> wrote:
> > > > >
> > > > > > Hi Dileepa,
> > > > > >
> > > > > > This seems to be going in the right direction clearly now in my
> > > > opinion.
> > > > > > Quick comments after a first review:
> > > > > >
> > > > > >
> > > > > >    - Rejecting a document because it can't be enhanced is kind of
> > > > tough.
> > > > > >    You are preventing a document to be finally indexed because
> the
> > > > > > enhancement
> > > > > >    didn't perform correctly, probably it is better just to let
> them
> > > > > > continue
> > > > > >    the workflow within the system
> > > > > >
> > > > >
> > > > > Got your point. Will remove that part from the code
> > > > >
> > > > >
> > > > > >    - As I can deduce for the code, you are correctly extracting
> the
> > > > > >    configured dereferenced fields, but you are not processing at
> > all
> > > > the
> > > > > >    LDPath results
> > > > > >
> > > > > > I'm passing the LDPath program as an enhancer parameter to
> Stanbol
> > to
> > > > > retrieve the enhancement result according to the LDPath program
> > (which
> > > is
> > > > > given as a text string in the connector UI).
> > > > > If the user has not defined a LDPath program and added derefence
> > fields
> > > > in
> > > > > the UI instead, then the enhancement request will be built using
> the
> > > > > dereference fields as enhancer parameters.
> > > > >
> > > > >
> > > > > If neither a LDPath or dereference fields are given in the
> > > transformation
> > > > > UI, then I just call the given enhancement chain without any other
> > > > enhancer
> > > > > paramaters.
> > > > >
> > > > > Please refer below code segment where I do this and let me know if
> it
> > > > needs
> > > > > more improvements.
> > > > >
> > > > >             // ldpath program is given priority if it's set
> > > > >             if (ldPath != null)
> > > > >             {
> > > > >                 parameters =
> > > > >
> > > > >
> > > >
> > >
> >
> EnhancerParameters.builder().setChain(chain).setContent(content).setLDpathProgram(ldPath).build();
> > > > >             }
> > > > >             else if (!derefFields.isEmpty())
> > > > >             {
> > > > >                 parameters =
> > > > >
> > > > >
> > > >
> > >
> >
> EnhancerParameters.builder().setChain(chain).setContent(content).setDereferencingFields(
> > > > >                         derefFields.keySet()).build();
> > > > >             }
> > > > >             else
> > > > >             {
> > > > >                 parameters =
> > > > >
> > >
> EnhancerParameters.builder().setChain(chain).setContent(content).build();
> > > > >             }
> > > > >             eRes = enhancerClient.enhance(parameters);
> > > > >
> > > > >
> > > > > Thanks,
> > > > > Dileepa
> > > > >
> > > > >
> > > > > >
> > > > > > Cheers,
> > > > > > Rafa
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Fri, Dec 11, 2015 at 1:05 PM Dileepa Jayakody <
> > > djayakody@zaizi.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Hi All,
> > > > > > >
> > > > > > > As per our discussion I have modified the Stanbol Connector so
> > that
> > > > it
> > > > > > adds
> > > > > > > all extracted entity URIs and entity attributes to the
> repository
> > > > > > document
> > > > > > > as fields.
> > > > > > >
> > > > > > > On a separate branch I have committed this code to our github
> > > project
> > > > > > > sensefy-connectors.
> > > > > > > You can find the source code here:
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/zaizi/sensefy-connectors/tree/feature/SENSEFY-1453-modify-stanbol-connector/transformation/mcf-stanbol-connector
> > > > > > > Let me know your feedback.
> > > > > > >
> > > > > > > I will write a blog post on how to add it in a connection and
> get
> > > > > > > ehancement results and share it with you.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Dileepa
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Mon, Dec 7, 2015 at 6:29 PM, Karl Wright <
> daddywri@gmail.com>
> > > > > wrote:
> > > > > > >
> > > > > > > > Hi Dileepa,
> > > > > > > >
> > > > > > > > You cannot create sub-documents in a transformation
> connector.
> > > And
> > > > > > > adding
> > > > > > > > that capability to the framework is not possible; we would be
> > > > missing
> > > > > > key
> > > > > > > > bookkeeping logic if that was allowed.
> > > > > > > >
> > > > > > > > Karl
> > > > > > > >
> > > > > > > >
> > > > > > > > On Mon, Dec 7, 2015 at 6:59 AM, Dileepa Jayakody <
> > > > > djayakody@zaizi.com>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi Karl,
> > > > > > > > >
> > > > > > > > > Thanks a lot for the pointer.
> > > > > > > > >
> > > > > > > > > Stanbol doesn't update an existing document, it generates a
> > new
> > > > > > > response
> > > > > > > > > with requested enhancement details for the content
> enhansment
> > > > > > request.
> > > > > > > > > For example for a request like : "Paris is a city in
> France"
> > > > > > following
> > > > > > > > RDF
> > > > > > > > > response [1] is given by Stanbol.
> > > > > > > > >
> > > > > > > > > In the Stanbol connector, enhancement artifacts such as
> > > > > > TextAnnotations
> > > > > > > > > and EntityAnnotations are extracted from the RDF response,
> to
> > > > > > generate
> > > > > > > > the
> > > > > > > > > entity abstractions and add them to the mcf repository
> > > document.
> > > > > > > > Currently
> > > > > > > > > in the Stanbol connector we have added these entity
> > > abstractions
> > > > as
> > > > > > > JSON
> > > > > > > > > strings to a multi-valued 'entities' field in the
> repository
> > > > > document
> > > > > > > and
> > > > > > > > > we parse that JSON in the SolrWrapper output connector to
> > index
> > > > in
> > > > > > > > separate
> > > > > > > > > Solr cores (primary documents, linked entities and entity
> > types
> > > > > with
> > > > > > > > their
> > > > > > > > > attributes).
> > > > > > > > >
> > > > > > > > > Can we can have a primary repository document and create
> sub
> > > > > > documents
> > > > > > > > for
> > > > > > > > > the extracted entities? Is it possible to generate sub
> > > documents
> > > > > for
> > > > > > a
> > > > > > > > > repo-document in a transformation connector?
> > > > > > > > >
> > > > > > > > > Thanks.
> > > > > > > > > Dileepa
> > > > > > > > >
> > > > > > > > > [1] Sample Stanbol response
> > > > > > > > >
> > > > > > > > > {
> > > > > > > > >   "@context": {
> > > > > > > > >     "dbp-ont": "http://dbpedia.org/ontology/",
> > > > > > > > >     "dc": "http://purl.org/dc/terms/",
> > > > > > > > >     "dc:created": {
> > > > > > > > >       "@type": "xsd:dateTime"
> > > > > > > > >     },
> > > > > > > > >     "enhancer": "http://fise.iks-project.eu/ontology/",
> > > > > > > > >     "enhancer:confidence": {
> > > > > > > > >       "@type": "xsd:double"
> > > > > > > > >     },
> > > > > > > > >     "enhancer:end": {
> > > > > > > > >       "@type": "xsd:int"
> > > > > > > > >     },
> > > > > > > > >     "enhancer:entity-reference": {
> > > > > > > > >       "@type": "@id"
> > > > > > > > >     },
> > > > > > > > >     "enhancer:entity-type": {
> > > > > > > > >       "@type": "@id"
> > > > > > > > >     },
> > > > > > > > >     "enhancer:extracted-from": {
> > > > > > > > >       "@type": "@id"
> > > > > > > > >     },
> > > > > > > > >     "enhancer:start": {
> > > > > > > > >       "@type": "xsd:int"
> > > > > > > > >     },
> > > > > > > > >     "entityhub": "
> > > > > > > > http://stanbol.apache.org/ontology/entityhub/entityhub#
> > > > > > > > > ",
> > > > > > > > >     "foaf": "http://xmlns.com/foaf/0.1/",
> > > > > > > > >     "foaf:depiction": {
> > > > > > > > >       "@type": "@id"
> > > > > > > > >     },
> > > > > > > > >     "owl": "http://www.w3.org/2002/07/owl#",
> > > > > > > > >     "rdfs": "http://www.w3.org/2000/01/rdf-schema#",
> > > > > > > > >     "schema": "http://schema.org/",
> > > > > > > > >     "xsd": "http://www.w3.org/2001/XMLSchema#"
> > > > > > > > >   },
> > > > > > > > >   "@graph": [
> > > > > > > > >     {
> > > > > > > > >       "@id": "http://dbpedia.org/resource/France",
> > > > > > > > >       "@type": [
> > > > > > > > >         "dbp-ont:Country",
> > > > > > > > >         "dbp-ont:Place",
> > > > > > > > >         "dbp-ont:PopulatedPlace",
> > > > > > > > >         "http://www.opengis.net/gml/_Feature",
> > > > > > > > >         "owl:Thing",
> > > > > > > > >         "schema:Country",
> > > > > > > > >         "schema:Place"
> > > > > > > > >       ],
> > > > > > > > >       "foaf:depiction": [
> > > > > > > > >         "
> > > > > > > > >
> > > > > >
> > > http://upload.wikimedia.org/wikipedia/commons/c/c3/Flag_of_France.svg
> > > > > > > ",
> > > > > > > > >         "
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://upload.wikimedia.org/wikipedia/commons/thumb/c/c3/Flag_of_France.svg/200px-Flag_of_France.svg.png
> > > > > > > > > "
> > > > > > > > >       ],
> > > > > > > > >       "rdfs:comment": {
> > > > > > > > >         "@language": "en",
> > > > > > > > >         "@value": "France, officially the French Republic,
> > is a
> > > > > > > > > unitary semi-presidential republic in Western Europe with
> > > several
> > > > > > > > > overseas territories and islands located on other
> continents
> > > and
> > > > in
> > > > > > > > > the Indian, Pacific, and Atlantic oceans. Metropolitan
> France
> > > > > extends
> > > > > > > > > from the Mediterranean Sea to the English Channel and the
> > North
> > > > > Sea,
> > > > > > > > > and from the Rhine to the Atlantic Ocean. It is often
> > referred
> > > to
> > > > > as
> > > > > > > > > l’Hexagone because of the geometric shape of its
> territory."
> > > > > > > > >       },
> > > > > > > > >       "rdfs:label": [
> > > > > > > > >         {
> > > > > > > > >           "@language": "en",
> > > > > > > > >           "@value": "France"
> > > > > > > > >         },
> > > > > > > > >         {
> > > > > > > > >           "@language": "fr",
> > > > > > > > >           "@value": "France"
> > > > > > > > >         },
> > > > > > > > >       ]
> > > > > > > > >     },
> > > > > > > > >
> > > > > > > > >     {
> > > > > > > > >       "@id": "http://dbpedia.org/resource/Paris",
> > > > > > > > >       "@type": [
> > > > > > > > >         "dbp-ont:Place",
> > > > > > > > >         "dbp-ont:PopulatedPlace",
> > > > > > > > >         "dbp-ont:Settlement",
> > > > > > > > >         "http://www.opengis.net/gml/_Feature",
> > > > > > > > >         "owl:Thing",
> > > > > > > > >         "schema:Place"
> > > > > > > > >       ],
> > > > > > > > >       "foaf:depiction": [
> > > > > > > > >         "
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://upload.wikimedia.org/wikipedia/commons/6/6e/Paris_-_Eiffelturm_und_Marsfeld2.jpg
> > > > > > > > > ",
> > > > > > > > >         "
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://upload.wikimedia.org/wikipedia/commons/thumb/6/6e/Paris_-_Eiffelturm_und_Marsfeld2.jpg/200px-Paris_-_Eiffelturm_und_Marsfeld2.jpg
> > > > > > > > > "
> > > > > > > > >       ],
> > > > > > > > >       "geo:lat": 48.8567,
> > > > > > > > >       "geo:long": 2.3508,
> > > > > > > > >       "rdfs:comment": {
> > > > > > > > >         "@language": "en",
> > > > > > > > >         "@value": "Paris is the capital and largest city of
> > > > France.
> > > > > > It
> > > > > > > > > is situated on the river Seine, in northern France, at the
> > > heart
> > > > of
> > > > > > > > > the Île-de-France region (or Paris Region, French: Région
> > > > > > parisienne).
> > > > > > > > > As of January 2008 the city of Paris, within its
> > administrative
> > > > > > limits
> > > > > > > > > largely unchanged since 1860, has an estimated population
> of
> > > > > > 2,211,297
> > > > > > > > > and a metropolitan population of 12,089,098, and is one of
> > the
> > > > most
> > > > > > > > > populated metropolitan areas in Europe."
> > > > > > > > >       },
> > > > > > > > >       "rdfs:label": [
> > > > > > > > >
> > > > > > > > >         {
> > > > > > > > >           "@language": "en",
> > > > > > > > >           "@value": "Paris"
> > > > > > > > >         },
> > > > > > > > >         {
> > > > > > > > >           "@language": "fr",
> > > > > > > > >           "@value": "Paris"
> > > > > > > > >         },
> > > > > > > > >       ]
> > > > > > > > >     },
> > > > > > > > >    }
> > > > > > > > >     {
> > > > > > > > >       "@id":
> > > > > "urn:enhancement-8db13707-1ecd-b4df-90ad-52447c8f2c84",
> > > > > > > > >       "@type": [
> > > > > > > > >         "enhancer:Enhancement",
> > > > > > > > >         "enhancer:TextAnnotation"
> > > > > > > > >       ],
> > > > > > > > >       "dc:created": "2015-12-07T11:22:07.740Z",
> > > > > > > > >       "dc:creator":
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> "org.apache.stanbol.enhancer.engines.opennlp.impl.NamedEntityExtractionEnhancementEngine",
> > > > > > > > >       "dc:type": "dbp-ont:Place",
> > > > > > > > >       "enhancer:confidence": 0.6017613,
> > > > > > > > >       "enhancer:end": 5,
> > > > > > > > >       "enhancer:extracted-from":
> > > > > > > > >
> > > "urn:content-item-sha1-c8ae372ed26679df14da13050dd432fd32c527e3",
> > > > > > > > >       "enhancer:selected-text": {
> > > > > > > > >         "@language": "en",
> > > > > > > > >         "@value": "Paris"
> > > > > > > > >       },
> > > > > > > > >       "enhancer:selection-context": {
> > > > > > > > >         "@language": "en",
> > > > > > > > >         "@value": "Paris is in France"
> > > > > > > > >       },
> > > > > > > > >       "enhancer:start": 0
> > > > > > > > >     },
> > > > > > > > >     {
> > > > > > > > >       "@id":
> > > > > "urn:enhancement-b2855552-0e46-62f5-cd33-9f84ab32e547",
> > > > > > > > >       "@type": [
> > > > > > > > >         "enhancer:Enhancement",
> > > > > > > > >         "enhancer:EntityAnnotation"
> > > > > > > > >       ],
> > > > > > > > >       "dc:created": "2015-12-07T11:22:07.748Z",
> > > > > > > > >       "dc:creator":
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> "org.apache.stanbol.enhancer.engines.entitytagging.impl.NamedEntityTaggingEngine",
> > > > > > > > >       "dc:relation":
> > > > > > > > > "urn:enhancement-e9c9c187-2d69-2c1f-6552-e76111430d4a",
> > > > > > > > >       "enhancer:confidence": 1.0,
> > > > > > > > >       "enhancer:entity-label": {
> > > > > > > > >         "@language": "en",
> > > > > > > > >         "@value": "France"
> > > > > > > > >       },
> > > > > > > > >       "enhancer:entity-reference": "
> > > > > > http://dbpedia.org/resource/France
> > > > > > > ",
> > > > > > > > >       "enhancer:entity-type": [
> > > > > > > > >         "dbp-ont:Country",
> > > > > > > > >         "dbp-ont:Place",
> > > > > > > > >         "dbp-ont:PopulatedPlace",
> > > > > > > > >         "schema:Country",
> > > > > > > > >         "schema:Place",
> > > > > > > > >         "http://www.opengis.net/gml/_Feature",
> > > > > > > > >         "owl:Thing"
> > > > > > > > >       ],
> > > > > > > > >       "enhancer:extracted-from":
> > > > > > > > >
> > > "urn:content-item-sha1-c8ae372ed26679df14da13050dd432fd32c527e3",
> > > > > > > > >       "entityhub:site": "dbpedia"
> > > > > > > > >     },
> > > > > > > > >     {
> > > > > > > > >       "@id":
> > > > > "urn:enhancement-c50474e4-ea0e-03ff-5db5-a25f4c8dae45",
> > > > > > > > >       "@type": [
> > > > > > > > >         "enhancer:Enhancement",
> > > > > > > > >         "enhancer:EntityAnnotation"
> > > > > > > > >       ],
> > > > > > > > >       "dc:created": "2015-12-07T11:22:07.748Z",
> > > > > > > > >       "dc:creator":
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> "org.apache.stanbol.enhancer.engines.entitytagging.impl.NamedEntityTaggingEngine",
> > > > > > > > >       "dc:relation":
> > > > > > > > > "urn:enhancement-e9c9c187-2d69-2c1f-6552-e76111430d4a",
> > > > > > > > >       "enhancer:confidence": 0.25715446,
> > > > > > > > >       "enhancer:entity-label": {
> > > > > > > > >         "@language": "en",
> > > > > > > > >         "@value": "Vichy France"
> > > > > > > > >       },
> > > > > > > > >       "enhancer:entity-reference": "
> > > > > > > > > http://dbpedia.org/resource/Vichy_France",
> > > > > > > > >       "enhancer:entity-type": [
> > > > > > > > >         "dbp-ont:Country",
> > > > > > > > >         "dbp-ont:Place",
> > > > > > > > >         "dbp-ont:PopulatedPlace",
> > > > > > > > >         "schema:Country",
> > > > > > > > >         "schema:Place",
> > > > > > > > >         "http://www.opengis.net/gml/_Feature",
> > > > > > > > >         "owl:Thing"
> > > > > > > > >       ],
> > > > > > > > >       "enhancer:extracted-from":
> > > > > > > > >
> > > "urn:content-item-sha1-c8ae372ed26679df14da13050dd432fd32c527e3",
> > > > > > > > >       "entityhub:site": "dbpedia"
> > > > > > > > >     },
> > > > > > > > >     {
> > > > > > > > >       "@id":
> > > > > "urn:enhancement-de07bc41-e4a1-f510-3f93-99ebfd8c39f4",
> > > > > > > > >       "@type": [
> > > > > > > > >         "enhancer:Enhancement",
> > > > > > > > >         "enhancer:EntityAnnotation"
> > > > > > > > >       ],
> > > > > > > > >       "dc:created": "2015-12-07T11:22:07.748Z",
> > > > > > > > >       "dc:creator":
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> "org.apache.stanbol.enhancer.engines.entitytagging.impl.NamedEntityTaggingEngine",
> > > > > > > > >       "dc:relation":
> > > > > > > > > "urn:enhancement-8db13707-1ecd-b4df-90ad-52447c8f2c84",
> > > > > > > > >       "enhancer:confidence": 0.1493264,
> > > > > > > > >       "enhancer:entity-label": {
> > > > > > > > >         "@language": "en",
> > > > > > > > >         "@value": "Paris Commune"
> > > > > > > > >       },
> > > > > > > > >       "enhancer:entity-reference": "
> > > > > > > > > http://dbpedia.org/resource/Paris_Commune",
> > > > > > > > >       "enhancer:entity-type": [
> > > > > > > > >         "dbp-ont:Country",
> > > > > > > > >         "dbp-ont:Place",
> > > > > > > > >         "dbp-ont:PopulatedPlace",
> > > > > > > > >         "schema:Country",
> > > > > > > > >         "schema:Place",
> > > > > > > > >         "owl:Thing"
> > > > > > > > >       ],
> > > > > > > > >       "enhancer:extracted-from":
> > > > > > > > >
> > > "urn:content-item-sha1-c8ae372ed26679df14da13050dd432fd32c527e3",
> > > > > > > > >       "entityhub:site": "dbpedia"
> > > > > > > > >     },
> > > > > > > > >     {
> > > > > > > > >       "@id":
> > > > > "urn:enhancement-e9c9c187-2d69-2c1f-6552-e76111430d4a",
> > > > > > > > >       "@type": [
> > > > > > > > >         "enhancer:Enhancement",
> > > > > > > > >         "enhancer:TextAnnotation"
> > > > > > > > >       ],
> > > > > > > > >       "dc:created": "2015-12-07T11:22:07.740Z",
> > > > > > > > >       "dc:creator":
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> "org.apache.stanbol.enhancer.engines.opennlp.impl.NamedEntityExtractionEnhancementEngine",
> > > > > > > > >       "dc:type": "dbp-ont:Place",
> > > > > > > > >       "enhancer:confidence": 0.99354976,
> > > > > > > > >       "enhancer:end": 18,
> > > > > > > > >       "enhancer:extracted-from":
> > > > > > > > >
> > > "urn:content-item-sha1-c8ae372ed26679df14da13050dd432fd32c527e3",
> > > > > > > > >       "enhancer:selected-text": {
> > > > > > > > >         "@language": "en",
> > > > > > > > >         "@value": "France"
> > > > > > > > >       },
> > > > > > > > >       "enhancer:selection-context": {
> > > > > > > > >         "@language": "en",
> > > > > > > > >         "@value": "Paris is in France"
> > > > > > > > >       },
> > > > > > > > >       "enhancer:start": 12
> > > > > > > > >     }
> > > > > > > > >   ]
> > > > > > > > > }
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Mon, Dec 7, 2015 at 4:23 PM, Karl Wright <
> > > daddywri@gmail.com>
> > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hi Dileepa,
> > > > > > > > > >
> > > > > > > > > > Repository connectors have an abstraction that allows
> them
> > to
> > > > > > > generate
> > > > > > > > > > compound documents (where a document has a primary
> > > identifier,
> > > > > and
> > > > > > > > there
> > > > > > > > > > are subdocuments that share that primary identifier and
> > have
> > > a
> > > > > > > > secondary
> > > > > > > > > > identifier).  This sounds a bit like what you are
> > describing.
> > > > > Does
> > > > > > > > > Stanbol
> > > > > > > > > > work by decorating an existing document, or does it work
> by
> > > > > > > generating
> > > > > > > > > all
> > > > > > > > > > content for a document?
> > > > > > > > > >
> > > > > > > > > > Karl
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Mon, Dec 7, 2015 at 5:12 AM, Dileepa Jayakody <
> > > > > > > djayakody@zaizi.com>
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Hi All,
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > While thanking you all for your input on Stanbol
> > connector
> > > > > > > > > requirement, I
> > > > > > > > > > > would like to continue with modifying the Stanbol
> > connector
> > > > to
> > > > > be
> > > > > > > > > > > compatible with any output connector. If you guys can
> > give
> > > > some
> > > > > > > > > guidance
> > > > > > > > > > on
> > > > > > > > > > > how the entity metadata should be added to the
> repository
> > > > > > document
> > > > > > > I
> > > > > > > > > can
> > > > > > > > > > > modify the stanbol connector accordingly.
> > > > > > > > > > >
> > > > > > > > > > > From Rafa's comments, I gathered we can add the entity
> > > > metadata
> > > > > > to
> > > > > > > > the
> > > > > > > > > > > repo.doc as key value pairs.
> > > > > > > > > > > However this idea is not yet clear to me. There could
> be
> > > 'N'
> > > > > > number
> > > > > > > > of
> > > > > > > > > > > entities in a document and each of them will have some
> > > common
> > > > > > > > > attributes
> > > > > > > > > > > such as name, id, type and specific attributes for
> > > particular
> > > > > > > entity
> > > > > > > > > > type.
> > > > > > > > > > > I'm not clear on how to maintain that structure of N
> > number
> > > > of
> > > > > > > > entities
> > > > > > > > > > > with their attributes in a repo.document as key value
> > pairs
> > > > and
> > > > > > > make
> > > > > > > > > them
> > > > > > > > > > > LDPath compatible for retrieval in an output connector.
> > > > > > > > > > >
> > > > > > > > > > > @Rafa
> > > > > > > > > > > If you can please elaborate on your suggestion it would
> > be
> > > > > > greatly
> > > > > > > > > > helpful
> > > > > > > > > > > to me.
> > > > > > > > > > > All other suggestions are also welcome.
> > > > > > > > > > >
> > > > > > > > > > > Thanks,
> > > > > > > > > > > Dileepa
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > On Fri, Nov 13, 2015 at 7:00 PM, Karl Wright <
> > > > > daddywri@gmail.com
> > > > > > >
> > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > I, too, agree.  Somebody will need to turn this
> > connector
> > > > > into
> > > > > > > one
> > > > > > > > > that
> > > > > > > > > > > > plays by the rules.  It may be possible for someone
> on
> > > the
> > > > > team
> > > > > > > > here
> > > > > > > > > to
> > > > > > > > > > > do
> > > > > > > > > > > > that, but it won't be me; I'm seriously overextended
> at
> > > the
> > > > > > > moment.
> > > > > > > > > It
> > > > > > > > > > > > would be best if someone who knew the connector well
> > > could
> > > > do
> > > > > > the
> > > > > > > > > > > necessary
> > > > > > > > > > > > work.
> > > > > > > > > > > >
> > > > > > > > > > > > Karl
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > On Fri, Nov 13, 2015 at 5:45 AM, Rafa Haro <
> > > > > > > rharoapache@gmail.com>
> > > > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > I must agree with Antonio. When I started to work
> on
> > > > this I
> > > > > > was
> > > > > > > > > > > expecting
> > > > > > > > > > > > > the connector to work by just extracting the
> entities
> > > and
> > > > > > > > entities
> > > > > > > > > > > > metadata
> > > > > > > > > > > > > and put them as plain metadata of the documents,
> > > probably
> > > > > > > > following
> > > > > > > > > > > > LDPATH
> > > > > > > > > > > > > queries configuration
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > This is probably ok for Sensefy but I don’t think
> > this
> > > > > could
> > > > > > be
> > > > > > > > > > > suitable
> > > > > > > > > > > > > to be included in the project. But this is only my
> > > > opinion.
> > > > > > Of
> > > > > > > > > > course,
> > > > > > > > > > > a
> > > > > > > > > > > > > version of the connector that fully respect the
> > > > ManifoldCF
> > > > > > > > > > architecture
> > > > > > > > > > > > > would be more than welcome in my opinion
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Fri, Nov 13, 2015 at 11:38 AM, Antonio David
> Pérez
> > > > > Morales
> > > > > > > > > > > > > <adperezmorales@gmail.com> wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > Hi
> > > > > > > > > > > > > > The removal of the SolrWrapper is a must. It was
> a
> > > > > > > requirement
> > > > > > > > > for
> > > > > > > > > > an
> > > > > > > > > > > > > > internal project which has nothing to do here
> with
> > a
> > > > > normal
> > > > > > > > > > operation
> > > > > > > > > > > > of
> > > > > > > > > > > > > > Manifold, so forcing the users to use Solr does
> not
> > > fit
> > > > > the
> > > > > > > > > > Manifold
> > > > > > > > > > > > > > philosophy.
> > > > > > > > > > > > > > In my opinion, at this moment, a Stanbol
> connector
> > > with
> > > > > > such
> > > > > > > a
> > > > > > > > > big
> > > > > > > > > > > > > > dependency which will not fit almost any use case
> > is
> > > > not
> > > > > > very
> > > > > > > > > > useful.
> > > > > > > > > > > > > > You should think a way to convert Stanbol
> connector
> > > > into
> > > > > a
> > > > > > > > normal
> > > > > > > > > > > > > > Transformation connector without assuming that a
> > > > specific
> > > > > > > > output
> > > > > > > > > > > > > connector
> > > > > > > > > > > > > > will be used.
> > > > > > > > > > > > > > Regards
> > > > > > > > > > > > > > 2015-11-13 11:20 GMT+01:00 Dileepa Jayakody <
> > > > > > > > djayakody@zaizi.com
> > > > > > > > > >:
> > > > > > > > > > > > > >> Hi guys,
> > > > > > > > > > > > > >>
> > > > > > > > > > > > > >> I have developed a Stanbol connector for MCF.
> You
> > > can
> > > > > > check
> > > > > > > it
> > > > > > > > > out
> > > > > > > > > > > > from
> > > > > > > > > > > > > our
> > > > > > > > > > > > > >> github repo here:
> > > > > > > > > > > > > >>
> > > > > > > > > > > > > >>
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/zaizi/sensefy-connectors/tree/master/transformation/mcf-stanbol-connector
> > > > > > > > > > > > > >>
> > > > > > > > > > > > > >> It requires the SolrWrapper output connector
> which
> > > > > indexes
> > > > > > > > > > enhanced
> > > > > > > > > > > > > >> documents, entities and entityTypes in separate
> > Solr
> > > > > > cores.
> > > > > > > > > > > Basically
> > > > > > > > > > > > it
> > > > > > > > > > > > > >> requires 3 separate solr cores configured with a
> > > > > specific
> > > > > > > Solr
> > > > > > > > > > > schema
> > > > > > > > > > > > > for
> > > > > > > > > > > > > >> primary documents, entities and entityTypes
> > > > separately.
> > > > > > This
> > > > > > > > was
> > > > > > > > > > > done
> > > > > > > > > > > > > for
> > > > > > > > > > > > > >> our specific use-case.
> > > > > > > > > > > > > >>
> > > > > > > > > > > > > >> The SolrWrapper code is here :
> > > > > > > > > > > > > >>
> > > > > > > > > > > > > >>
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/zaizi/sensefy-connectors/tree/master/output/mcf-solrwrapperconnector
> > > > > > > > > > > > > >>
> > > > > > > > > > > > > >> Perhaps we can discuss and remove the Stanbol
> > > > > connector's
> > > > > > > > > > dependency
> > > > > > > > > > > > > with
> > > > > > > > > > > > > >> SolrWrapper and have it working with any output
> > > > > connector.
> > > > > > > > > > > > > >> Please note that the Stanbol connector currently
> > > has a
> > > > > bug
> > > > > > > in
> > > > > > > > > the
> > > > > > > > > > UI
> > > > > > > > > > > > > >> (editSpecification) which I'm working on at the
> > > > moment.
> > > > > > > After
> > > > > > > > > > fixing
> > > > > > > > > > > > > that I
> > > > > > > > > > > > > >> will update here. And also I will provide
> > > > documentations
> > > > > > for
> > > > > > > > > > > > configuring
> > > > > > > > > > > > > >> the connector.
> > > > > > > > > > > > > >>
> > > > > > > > > > > > > >> Thanks,
> > > > > > > > > > > > > >> Dileepa
> > > > > > > > > > > > > >>
> > > > > > > > > > > > > >> On Thu, Jul 9, 2015 at 8:36 PM, Antonio David
> > Pérez
> > > > > > Morales
> > > > > > > <
> > > > > > > > > > > > > >> adperezmorales@gmail.com> wrote:
> > > > > > > > > > > > > >>
> > > > > > > > > > > > > >> > Hi Joshua
> > > > > > > > > > > > > >> >
> > > > > > > > > > > > > >> > It is not the list for that, but Marmotta is
> > > already
> > > > > > > > > integrated
> > > > > > > > > > in
> > > > > > > > > > > > > Apache
> > > > > > > > > > > > > >> > Stanbol. You can take a look at this issue
> > > > > > > > > > > > > >> >
> > > https://issues.apache.org/jira/browse/STANBOL-1165
> > > > .
> > > > > > > > > > > > > >> >
> > > > > > > > > > > > > >> > Anyway, as I said this is not the list for
> that,
> > > so
> > > > > > let's
> > > > > > > > use
> > > > > > > > > > the
> > > > > > > > > > > > > proper
> > > > > > > > > > > > > >> > list for these things.
> > > > > > > > > > > > > >> >
> > > > > > > > > > > > > >> > Regards
> > > > > > > > > > > > > >> >
> > > > > > > > > > > > > >> >
> > > > > > > > > > > > > >> >
> > > > > > > > > > > > > >> > 2015-07-09 15:29 GMT+02:00 Joshua Dunham <
> > > > > > > > > > joshua.dunham@gmail.com
> > > > > > > > > > > >:
> > > > > > > > > > > > > >> >
> > > > > > > > > > > > > >> > > Hey Dileepa,
> > > > > > > > > > > > > >> > >
> > > > > > > > > > > > > >> > >       In case you were interested, I pinged
> > the
> > > > > list a
> > > > > > > few
> > > > > > > > > > days
> > > > > > > > > > > > ago
> > > > > > > > > > > > > >> > asking
> > > > > > > > > > > > > >> > > for integration tips for Apache Marmotta.
> > > > > > > > > > > > > >> > >
> > > > > > > > > > > > > >> > > I got some great tips on how to do this
> which
> > > > could
> > > > > > help
> > > > > > > > > you.
> > > > > > > > > > > > Since
> > > > > > > > > > > > > >> > > Marmotta is a drop in replacement for
> Clarezza
> > > on
> > > > > > > Stanbol
> > > > > > > > it
> > > > > > > > > > may
> > > > > > > > > > > > be
> > > > > > > > > > > > > >> > easier
> > > > > > > > > > > > > >> > > for you to take this way.
> > > > > > > > > > > > > >> > >
> > > > > > > > > > > > > >> > > I'm not a Java programmer but I'm bringing
> > this
> > > > > > problem
> > > > > > > to
> > > > > > > > > the
> > > > > > > > > > > > > >> > development
> > > > > > > > > > > > > >> > > staff at my company for assistance. If you
> > like
> > > > the
> > > > > > > > Marmotta
> > > > > > > > > > > > > approach
> > > > > > > > > > > > > >> we
> > > > > > > > > > > > > >> > > may gain more traction solving the same
> > > > integration.
> > > > > > > > > > > > > >> > >
> > > > > > > > > > > > > >> > > I'm also integrating Marmotta with Stanbol
> so
> > > the
> > > > > > effect
> > > > > > > > > would
> > > > > > > > > > > be
> > > > > > > > > > > > > the
> > > > > > > > > > > > > >> > same
> > > > > > > > > > > > > >> > > except not using the Stanbol API for data
> > import
> > > > in
> > > > > > > favor
> > > > > > > > of
> > > > > > > > > > > > > Marmotta.
> > > > > > > > > > > > > >> > >
> > > > > > > > > > > > > >> > > Best,
> > > > > > > > > > > > > >> > >
> > > > > > > > > > > > > >> > > -J
> > > > > > > > > > > > > >> > >
> > > > > > > > > > > > > >> > > > On Jul 9, 2015, at 1:03 AM, Dileepa
> > Jayakody <
> > > > > > > > > > > > djayakody@zaizi.com
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >> > > wrote:
> > > > > > > > > > > > > >> > > >
> > > > > > > > > > > > > >> > > > Hi all,
> > > > > > > > > > > > > >> > > >
> > > > > > > > > > > > > >> > > > Thanks you for the feedback and offering
> > your
> > > > help
> > > > > > in
> > > > > > > > > this.
> > > > > > > > > > > > > >> > > > Let me get back to you on where to start
> the
> > > > code
> > > > > > > base.
> > > > > > > > > > > > > >> > > > As the first step, I would like to start
> by
> > > > > > creating a
> > > > > > > > > > > > > architecture
> > > > > > > > > > > > > >> > > diagram
> > > > > > > > > > > > > >> > > > for the connector.
> > > > > > > > > > > > > >> > > > I will send the diagram for your review
> > soon.
> > > > > > > > > > > > > >> > > >
> > > > > > > > > > > > > >> > > > Thanks,
> > > > > > > > > > > > > >> > > > Dileepa
> > > > > > > > > > > > > >> > > >
> > > > > > > > > > > > > >> > > > --
> > > > > > > > > > > > > >> > > >
> > > > > > > > > > > > > >> > > > ------------------------------
> > > > > > > > > > > > > >> > > > This message should be regarded as
> > > confidential.
> > > > > If
> > > > > > > you
> > > > > > > > > have
> > > > > > > > > > > > > received
> > > > > > > > > > > > > >> > > this
> > > > > > > > > > > > > >> > > > email in error please notify the sender
> and
> > > > > destroy
> > > > > > it
> > > > > > > > > > > > > immediately.
> > > > > > > > > > > > > >> > > > Statements of intent shall only become
> > binding
> > > > > when
> > > > > > > > > > confirmed
> > > > > > > > > > > in
> > > > > > > > > > > > > hard
> > > > > > > > > > > > > >> > > copy
> > > > > > > > > > > > > >> > > > by an authorised signatory.
> > > > > > > > > > > > > >> > > >
> > > > > > > > > > > > > >> > > > Zaizi Ltd is registered in England and
> Wales
> > > > with
> > > > > > the
> > > > > > > > > > > > registration
> > > > > > > > > > > > > >> > number
> > > > > > > > > > > > > >> > > > 6440931. The Registered Office is Brook
> > House,
> > > > 229
> > > > > > > > > Shepherds
> > > > > > > > > > > > Bush
> > > > > > > > > > > > > >> Road,
> > > > > > > > > > > > > >> > > > London W6 7AN.
> > > > > > > > > > > > > >> > >
> > > > > > > > > > > > > >> >
> > > > > > > > > > > > > >>
> > > > > > > > > > > > > >> --
> > > > > > > > > > > > > >>
> > > > > > > > > > > > > >> ------------------------------
> > > > > > > > > > > > > >> This message should be regarded as confidential.
> > If
> > > > you
> > > > > > have
> > > > > > > > > > > received
> > > > > > > > > > > > > this
> > > > > > > > > > > > > >> email in error please notify the sender and
> > destroy
> > > it
> > > > > > > > > > immediately.
> > > > > > > > > > > > > >> Statements of intent shall only become binding
> > when
> > > > > > > confirmed
> > > > > > > > in
> > > > > > > > > > > hard
> > > > > > > > > > > > > copy
> > > > > > > > > > > > > >> by an authorised signatory.
> > > > > > > > > > > > > >>
> > > > > > > > > > > > > >> Zaizi Ltd is registered in England and Wales
> with
> > > the
> > > > > > > > > registration
> > > > > > > > > > > > > number
> > > > > > > > > > > > > >> 6440931. The Registered Office is Brook House,
> 229
> > > > > > Shepherds
> > > > > > > > > Bush
> > > > > > > > > > > > Road,
> > > > > > > > > > > > > >> London W6 7AN.
> > > > > > > > > > > > > >>
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > --
> > > > > > > > > > >
> > > > > > > > > > > ------------------------------
> > > > > > > > > > > This message should be regarded as confidential. If you
> > > have
> > > > > > > received
> > > > > > > > > > this
> > > > > > > > > > > email in error please notify the sender and destroy it
> > > > > > immediately.
> > > > > > > > > > > Statements of intent shall only become binding when
> > > confirmed
> > > > > in
> > > > > > > hard
> > > > > > > > > > copy
> > > > > > > > > > > by an authorised signatory.
> > > > > > > > > > >
> > > > > > > > > > > Zaizi Ltd is registered in England and Wales with the
> > > > > > registration
> > > > > > > > > number
> > > > > > > > > > > 6440931. The Registered Office is Brook House, 229
> > > Shepherds
> > > > > Bush
> > > > > > > > Road,
> > > > > > > > > > > London W6 7AN.
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > >
> > > > > > > > > ------------------------------
> > > > > > > > > This message should be regarded as confidential. If you
> have
> > > > > received
> > > > > > > > this
> > > > > > > > > email in error please notify the sender and destroy it
> > > > immediately.
> > > > > > > > > Statements of intent shall only become binding when
> confirmed
> > > in
> > > > > hard
> > > > > > > > copy
> > > > > > > > > by an authorised signatory.
> > > > > > > > >
> > > > > > > > > Zaizi Ltd is registered in England and Wales with the
> > > > registration
> > > > > > > number
> > > > > > > > > 6440931. The Registered Office is Brook House, 229
> Shepherds
> > > Bush
> > > > > > Road,
> > > > > > > > > London W6 7AN.
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > >
> > > > > > > ------------------------------
> > > > > > > This message should be regarded as confidential. If you have
> > > received
> > > > > > this
> > > > > > > email in error please notify the sender and destroy it
> > immediately.
> > > > > > > Statements of intent shall only become binding when confirmed
> in
> > > hard
> > > > > > copy
> > > > > > > by an authorised signatory.
> > > > > > >
> > > > > > > Zaizi Ltd is registered in England and Wales with the
> > registration
> > > > > number
> > > > > > > 6440931. The Registered Office is Brook House, 229 Shepherds
> Bush
> > > > Road,
> > > > > > > London W6 7AN.
> > > > > > >
> > > > > >
> > > > >
> > > > > --
> > > > >
> > > > > ------------------------------
> > > > > This message should be regarded as confidential. If you have
> received
> > > > this
> > > > > email in error please notify the sender and destroy it immediately.
> > > > > Statements of intent shall only become binding when confirmed in
> hard
> > > > copy
> > > > > by an authorised signatory.
> > > > >
> > > > > Zaizi Ltd is registered in England and Wales with the registration
> > > number
> > > > > 6440931. The Registered Office is Brook House, 229 Shepherds Bush
> > Road,
> > > > > London W6 7AN.
> > > > >
> > > >
> > >
> >
> > --
> >
> > ------------------------------
> > This message should be regarded as confidential. If you have received
> this
> > email in error please notify the sender and destroy it immediately.
> > Statements of intent shall only become binding when confirmed in hard
> copy
> > by an authorised signatory.
> >
> > Zaizi Ltd is registered in England and Wales with the registration number
> > 6440931. The Registered Office is Brook House, 229 Shepherds Bush Road,
> > London W6 7AN.
> >
>

-- 

------------------------------
This message should be regarded as confidential. If you have received this 
email in error please notify the sender and destroy it immediately. 
Statements of intent shall only become binding when confirmed in hard copy 
by an authorised signatory.

Zaizi Ltd is registered in England and Wales with the registration number 
6440931. The Registered Office is Brook House, 229 Shepherds Bush Road, 
London W6 7AN. 

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message