manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: ManifoldCF transformation connector for Apache Stanbol
Date Sat, 12 Dec 2015 11:56:08 GMT
Ok, it seems premature for me to try to import this from Github today, so
I'll wait until the dust settles a bit further first.

Karl


On Fri, Dec 11, 2015 at 1:45 PM, Dileepa Jayakody <djayakody@zaizi.com>
wrote:

> Thanks a lot Rafa for pointing that out. big miss as  I didn't test the
> LDPath configuration part yet. More improvements to be done.
> I will do the required mprovements as pointed out.
>
> Regards,
> Dileepa
>
>
> On Fri, Dec 11, 2015 at 8:42 PM, Rafa Haro <rharo@apache.org> wrote:
>
> > Hi Dileepa,
> >
> > The problem is not in that part on the code, it is rather on this part:
> >
> > if (entity != null) { Collection<String> properties = entity.
> > getProperties(); for (String property : properties) { String
> > targetFieldName = derefFields.get(property); Set<String> propValues =
> > entityPropertyMap.get(targetFieldName); if (propValues == null) {
> > propValues = new HashSet<String>(); } Collection<String>
> entityPropValues =
> > entity.getPropertyValues(property); propValues.addAll(entityPropValues);
> > entityPropertyMap.put(targetFieldName, propValues); } }
> > You are collecting from the EnhancementStructure response just only the
> > configured dereferenced fields and LDPath fields are ignored. Also, there
> > is a potential bug in that code if there is no dereferencing field
> > configured for a certain entity property here:
> >
> > String targetFieldName = derefFields.get(property);
> >
> > targetFieldName would be Null then. Instead of trying to index every
> > property, you should just collect the configured ones by the user (or at
> > least, if the user wants all of them, provide a configuration option for
> > that).
> >
> > Anyway, going back to LDPath issue, please take into account that when
> you
> > define a field you must use a custom Namespace and Prefix for later being
> > able to retrieve that property from the entity. If you don't do that,
> > Stanbol will provide a random namespace for that property. Check this
> > example from RedLink SDK:
> >
> >
> >
> https://github.com/redlink-gmbh/redlink-java-sdk/blob/master/src/test/java/io/redlink/sdk/AnalysisTest.java#L423-443
> >
> > Hope that helps
> >
> > On Fri, Dec 11, 2015 at 3:57 PM Karl Wright <daddywri@gmail.com> wrote:
> >
> > > The next step would be to pull this code into an svn branch.  This is
> > > something I can tackled after the 2.3 release candidate is put
> together.
> > >
> > > Thanks,
> > > Karl
> > >
> > >
> > > On Fri, Dec 11, 2015 at 9:07 AM, Dileepa Jayakody <djayakody@zaizi.com
> >
> > > wrote:
> > >
> > > > Hi Rafa,
> > > >
> > > > Thanks for reviewing my code and for your feedback. Please see my
> > > comments
> > > > inline below.
> > > >
> > > >
> > > > On Fri, Dec 11, 2015 at 6:51 PM, Rafa Haro <rharo@apache.org> wrote:
> > > >
> > > > > Hi Dileepa,
> > > > >
> > > > > This seems to be going in the right direction clearly now in my
> > > opinion.
> > > > > Quick comments after a first review:
> > > > >
> > > > >
> > > > >    - Rejecting a document because it can't be enhanced is kind of
> > > tough.
> > > > >    You are preventing a document to be finally indexed because the
> > > > > enhancement
> > > > >    didn't perform correctly, probably it is better just to let them
> > > > > continue
> > > > >    the workflow within the system
> > > > >
> > > >
> > > > Got your point. Will remove that part from the code
> > > >
> > > >
> > > > >    - As I can deduce for the code, you are correctly extracting the
> > > > >    configured dereferenced fields, but you are not processing at
> all
> > > the
> > > > >    LDPath results
> > > > >
> > > > > I'm passing the LDPath program as an enhancer parameter to Stanbol
> to
> > > > retrieve the enhancement result according to the LDPath program
> (which
> > is
> > > > given as a text string in the connector UI).
> > > > If the user has not defined a LDPath program and added derefence
> fields
> > > in
> > > > the UI instead, then the enhancement request will be built using the
> > > > dereference fields as enhancer parameters.
> > > >
> > > >
> > > > If neither a LDPath or dereference fields are given in the
> > transformation
> > > > UI, then I just call the given enhancement chain without any other
> > > enhancer
> > > > paramaters.
> > > >
> > > > Please refer below code segment where I do this and let me know if it
> > > needs
> > > > more improvements.
> > > >
> > > >             // ldpath program is given priority if it's set
> > > >             if (ldPath != null)
> > > >             {
> > > >                 parameters =
> > > >
> > > >
> > >
> >
> EnhancerParameters.builder().setChain(chain).setContent(content).setLDpathProgram(ldPath).build();
> > > >             }
> > > >             else if (!derefFields.isEmpty())
> > > >             {
> > > >                 parameters =
> > > >
> > > >
> > >
> >
> EnhancerParameters.builder().setChain(chain).setContent(content).setDereferencingFields(
> > > >                         derefFields.keySet()).build();
> > > >             }
> > > >             else
> > > >             {
> > > >                 parameters =
> > > >
> > EnhancerParameters.builder().setChain(chain).setContent(content).build();
> > > >             }
> > > >             eRes = enhancerClient.enhance(parameters);
> > > >
> > > >
> > > > Thanks,
> > > > Dileepa
> > > >
> > > >
> > > > >
> > > > > Cheers,
> > > > > Rafa
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > On Fri, Dec 11, 2015 at 1:05 PM Dileepa Jayakody <
> > djayakody@zaizi.com>
> > > > > wrote:
> > > > >
> > > > > > Hi All,
> > > > > >
> > > > > > As per our discussion I have modified the Stanbol Connector so
> that
> > > it
> > > > > adds
> > > > > > all extracted entity URIs and entity attributes to the repository
> > > > > document
> > > > > > as fields.
> > > > > >
> > > > > > On a separate branch I have committed this code to our github
> > project
> > > > > > sensefy-connectors.
> > > > > > You can find the source code here:
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/zaizi/sensefy-connectors/tree/feature/SENSEFY-1453-modify-stanbol-connector/transformation/mcf-stanbol-connector
> > > > > > Let me know your feedback.
> > > > > >
> > > > > > I will write a blog post on how to add it in a connection and get
> > > > > > ehancement results and share it with you.
> > > > > >
> > > > > > Thanks,
> > > > > > Dileepa
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Mon, Dec 7, 2015 at 6:29 PM, Karl Wright <daddywri@gmail.com>
> > > > wrote:
> > > > > >
> > > > > > > Hi Dileepa,
> > > > > > >
> > > > > > > You cannot create sub-documents in a transformation connector.
> > And
> > > > > > adding
> > > > > > > that capability to the framework is not possible; we would be
> > > missing
> > > > > key
> > > > > > > bookkeeping logic if that was allowed.
> > > > > > >
> > > > > > > Karl
> > > > > > >
> > > > > > >
> > > > > > > On Mon, Dec 7, 2015 at 6:59 AM, Dileepa Jayakody <
> > > > djayakody@zaizi.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hi Karl,
> > > > > > > >
> > > > > > > > Thanks a lot for the pointer.
> > > > > > > >
> > > > > > > > Stanbol doesn't update an existing document, it generates a
> new
> > > > > > response
> > > > > > > > with requested enhancement details for the content enhansment
> > > > > request.
> > > > > > > > For example for a request like : "Paris is a city in France"
> > > > > following
> > > > > > > RDF
> > > > > > > > response [1] is given by Stanbol.
> > > > > > > >
> > > > > > > > In the Stanbol connector, enhancement artifacts such as
> > > > > TextAnnotations
> > > > > > > > and EntityAnnotations are extracted from the RDF response, to
> > > > > generate
> > > > > > > the
> > > > > > > > entity abstractions and add them to the mcf repository
> > document.
> > > > > > > Currently
> > > > > > > > in the Stanbol connector we have added these entity
> > abstractions
> > > as
> > > > > > JSON
> > > > > > > > strings to a multi-valued 'entities' field in the repository
> > > > document
> > > > > > and
> > > > > > > > we parse that JSON in the SolrWrapper output connector to
> index
> > > in
> > > > > > > separate
> > > > > > > > Solr cores (primary documents, linked entities and entity
> types
> > > > with
> > > > > > > their
> > > > > > > > attributes).
> > > > > > > >
> > > > > > > > Can we can have a primary repository document and create sub
> > > > > documents
> > > > > > > for
> > > > > > > > the extracted entities? Is it possible to generate sub
> > documents
> > > > for
> > > > > a
> > > > > > > > repo-document in a transformation connector?
> > > > > > > >
> > > > > > > > Thanks.
> > > > > > > > Dileepa
> > > > > > > >
> > > > > > > > [1] Sample Stanbol response
> > > > > > > >
> > > > > > > > {
> > > > > > > >   "@context": {
> > > > > > > >     "dbp-ont": "http://dbpedia.org/ontology/",
> > > > > > > >     "dc": "http://purl.org/dc/terms/",
> > > > > > > >     "dc:created": {
> > > > > > > >       "@type": "xsd:dateTime"
> > > > > > > >     },
> > > > > > > >     "enhancer": "http://fise.iks-project.eu/ontology/",
> > > > > > > >     "enhancer:confidence": {
> > > > > > > >       "@type": "xsd:double"
> > > > > > > >     },
> > > > > > > >     "enhancer:end": {
> > > > > > > >       "@type": "xsd:int"
> > > > > > > >     },
> > > > > > > >     "enhancer:entity-reference": {
> > > > > > > >       "@type": "@id"
> > > > > > > >     },
> > > > > > > >     "enhancer:entity-type": {
> > > > > > > >       "@type": "@id"
> > > > > > > >     },
> > > > > > > >     "enhancer:extracted-from": {
> > > > > > > >       "@type": "@id"
> > > > > > > >     },
> > > > > > > >     "enhancer:start": {
> > > > > > > >       "@type": "xsd:int"
> > > > > > > >     },
> > > > > > > >     "entityhub": "
> > > > > > > http://stanbol.apache.org/ontology/entityhub/entityhub#
> > > > > > > > ",
> > > > > > > >     "foaf": "http://xmlns.com/foaf/0.1/",
> > > > > > > >     "foaf:depiction": {
> > > > > > > >       "@type": "@id"
> > > > > > > >     },
> > > > > > > >     "owl": "http://www.w3.org/2002/07/owl#",
> > > > > > > >     "rdfs": "http://www.w3.org/2000/01/rdf-schema#",
> > > > > > > >     "schema": "http://schema.org/",
> > > > > > > >     "xsd": "http://www.w3.org/2001/XMLSchema#"
> > > > > > > >   },
> > > > > > > >   "@graph": [
> > > > > > > >     {
> > > > > > > >       "@id": "http://dbpedia.org/resource/France",
> > > > > > > >       "@type": [
> > > > > > > >         "dbp-ont:Country",
> > > > > > > >         "dbp-ont:Place",
> > > > > > > >         "dbp-ont:PopulatedPlace",
> > > > > > > >         "http://www.opengis.net/gml/_Feature",
> > > > > > > >         "owl:Thing",
> > > > > > > >         "schema:Country",
> > > > > > > >         "schema:Place"
> > > > > > > >       ],
> > > > > > > >       "foaf:depiction": [
> > > > > > > >         "
> > > > > > > >
> > > > >
> > http://upload.wikimedia.org/wikipedia/commons/c/c3/Flag_of_France.svg
> > > > > > ",
> > > > > > > >         "
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://upload.wikimedia.org/wikipedia/commons/thumb/c/c3/Flag_of_France.svg/200px-Flag_of_France.svg.png
> > > > > > > > "
> > > > > > > >       ],
> > > > > > > >       "rdfs:comment": {
> > > > > > > >         "@language": "en",
> > > > > > > >         "@value": "France, officially the French Republic,
> is a
> > > > > > > > unitary semi-presidential republic in Western Europe with
> > several
> > > > > > > > overseas territories and islands located on other continents
> > and
> > > in
> > > > > > > > the Indian, Pacific, and Atlantic oceans. Metropolitan France
> > > > extends
> > > > > > > > from the Mediterranean Sea to the English Channel and the
> North
> > > > Sea,
> > > > > > > > and from the Rhine to the Atlantic Ocean. It is often
> referred
> > to
> > > > as
> > > > > > > > l’Hexagone because of the geometric shape of its territory."
> > > > > > > >       },
> > > > > > > >       "rdfs:label": [
> > > > > > > >         {
> > > > > > > >           "@language": "en",
> > > > > > > >           "@value": "France"
> > > > > > > >         },
> > > > > > > >         {
> > > > > > > >           "@language": "fr",
> > > > > > > >           "@value": "France"
> > > > > > > >         },
> > > > > > > >       ]
> > > > > > > >     },
> > > > > > > >
> > > > > > > >     {
> > > > > > > >       "@id": "http://dbpedia.org/resource/Paris",
> > > > > > > >       "@type": [
> > > > > > > >         "dbp-ont:Place",
> > > > > > > >         "dbp-ont:PopulatedPlace",
> > > > > > > >         "dbp-ont:Settlement",
> > > > > > > >         "http://www.opengis.net/gml/_Feature",
> > > > > > > >         "owl:Thing",
> > > > > > > >         "schema:Place"
> > > > > > > >       ],
> > > > > > > >       "foaf:depiction": [
> > > > > > > >         "
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://upload.wikimedia.org/wikipedia/commons/6/6e/Paris_-_Eiffelturm_und_Marsfeld2.jpg
> > > > > > > > ",
> > > > > > > >         "
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://upload.wikimedia.org/wikipedia/commons/thumb/6/6e/Paris_-_Eiffelturm_und_Marsfeld2.jpg/200px-Paris_-_Eiffelturm_und_Marsfeld2.jpg
> > > > > > > > "
> > > > > > > >       ],
> > > > > > > >       "geo:lat": 48.8567,
> > > > > > > >       "geo:long": 2.3508,
> > > > > > > >       "rdfs:comment": {
> > > > > > > >         "@language": "en",
> > > > > > > >         "@value": "Paris is the capital and largest city of
> > > France.
> > > > > It
> > > > > > > > is situated on the river Seine, in northern France, at the
> > heart
> > > of
> > > > > > > > the Île-de-France region (or Paris Region, French: Région
> > > > > parisienne).
> > > > > > > > As of January 2008 the city of Paris, within its
> administrative
> > > > > limits
> > > > > > > > largely unchanged since 1860, has an estimated population of
> > > > > 2,211,297
> > > > > > > > and a metropolitan population of 12,089,098, and is one of
> the
> > > most
> > > > > > > > populated metropolitan areas in Europe."
> > > > > > > >       },
> > > > > > > >       "rdfs:label": [
> > > > > > > >
> > > > > > > >         {
> > > > > > > >           "@language": "en",
> > > > > > > >           "@value": "Paris"
> > > > > > > >         },
> > > > > > > >         {
> > > > > > > >           "@language": "fr",
> > > > > > > >           "@value": "Paris"
> > > > > > > >         },
> > > > > > > >       ]
> > > > > > > >     },
> > > > > > > >    }
> > > > > > > >     {
> > > > > > > >       "@id":
> > > > "urn:enhancement-8db13707-1ecd-b4df-90ad-52447c8f2c84",
> > > > > > > >       "@type": [
> > > > > > > >         "enhancer:Enhancement",
> > > > > > > >         "enhancer:TextAnnotation"
> > > > > > > >       ],
> > > > > > > >       "dc:created": "2015-12-07T11:22:07.740Z",
> > > > > > > >       "dc:creator":
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> "org.apache.stanbol.enhancer.engines.opennlp.impl.NamedEntityExtractionEnhancementEngine",
> > > > > > > >       "dc:type": "dbp-ont:Place",
> > > > > > > >       "enhancer:confidence": 0.6017613,
> > > > > > > >       "enhancer:end": 5,
> > > > > > > >       "enhancer:extracted-from":
> > > > > > > >
> > "urn:content-item-sha1-c8ae372ed26679df14da13050dd432fd32c527e3",
> > > > > > > >       "enhancer:selected-text": {
> > > > > > > >         "@language": "en",
> > > > > > > >         "@value": "Paris"
> > > > > > > >       },
> > > > > > > >       "enhancer:selection-context": {
> > > > > > > >         "@language": "en",
> > > > > > > >         "@value": "Paris is in France"
> > > > > > > >       },
> > > > > > > >       "enhancer:start": 0
> > > > > > > >     },
> > > > > > > >     {
> > > > > > > >       "@id":
> > > > "urn:enhancement-b2855552-0e46-62f5-cd33-9f84ab32e547",
> > > > > > > >       "@type": [
> > > > > > > >         "enhancer:Enhancement",
> > > > > > > >         "enhancer:EntityAnnotation"
> > > > > > > >       ],
> > > > > > > >       "dc:created": "2015-12-07T11:22:07.748Z",
> > > > > > > >       "dc:creator":
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> "org.apache.stanbol.enhancer.engines.entitytagging.impl.NamedEntityTaggingEngine",
> > > > > > > >       "dc:relation":
> > > > > > > > "urn:enhancement-e9c9c187-2d69-2c1f-6552-e76111430d4a",
> > > > > > > >       "enhancer:confidence": 1.0,
> > > > > > > >       "enhancer:entity-label": {
> > > > > > > >         "@language": "en",
> > > > > > > >         "@value": "France"
> > > > > > > >       },
> > > > > > > >       "enhancer:entity-reference": "
> > > > > http://dbpedia.org/resource/France
> > > > > > ",
> > > > > > > >       "enhancer:entity-type": [
> > > > > > > >         "dbp-ont:Country",
> > > > > > > >         "dbp-ont:Place",
> > > > > > > >         "dbp-ont:PopulatedPlace",
> > > > > > > >         "schema:Country",
> > > > > > > >         "schema:Place",
> > > > > > > >         "http://www.opengis.net/gml/_Feature",
> > > > > > > >         "owl:Thing"
> > > > > > > >       ],
> > > > > > > >       "enhancer:extracted-from":
> > > > > > > >
> > "urn:content-item-sha1-c8ae372ed26679df14da13050dd432fd32c527e3",
> > > > > > > >       "entityhub:site": "dbpedia"
> > > > > > > >     },
> > > > > > > >     {
> > > > > > > >       "@id":
> > > > "urn:enhancement-c50474e4-ea0e-03ff-5db5-a25f4c8dae45",
> > > > > > > >       "@type": [
> > > > > > > >         "enhancer:Enhancement",
> > > > > > > >         "enhancer:EntityAnnotation"
> > > > > > > >       ],
> > > > > > > >       "dc:created": "2015-12-07T11:22:07.748Z",
> > > > > > > >       "dc:creator":
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> "org.apache.stanbol.enhancer.engines.entitytagging.impl.NamedEntityTaggingEngine",
> > > > > > > >       "dc:relation":
> > > > > > > > "urn:enhancement-e9c9c187-2d69-2c1f-6552-e76111430d4a",
> > > > > > > >       "enhancer:confidence": 0.25715446,
> > > > > > > >       "enhancer:entity-label": {
> > > > > > > >         "@language": "en",
> > > > > > > >         "@value": "Vichy France"
> > > > > > > >       },
> > > > > > > >       "enhancer:entity-reference": "
> > > > > > > > http://dbpedia.org/resource/Vichy_France",
> > > > > > > >       "enhancer:entity-type": [
> > > > > > > >         "dbp-ont:Country",
> > > > > > > >         "dbp-ont:Place",
> > > > > > > >         "dbp-ont:PopulatedPlace",
> > > > > > > >         "schema:Country",
> > > > > > > >         "schema:Place",
> > > > > > > >         "http://www.opengis.net/gml/_Feature",
> > > > > > > >         "owl:Thing"
> > > > > > > >       ],
> > > > > > > >       "enhancer:extracted-from":
> > > > > > > >
> > "urn:content-item-sha1-c8ae372ed26679df14da13050dd432fd32c527e3",
> > > > > > > >       "entityhub:site": "dbpedia"
> > > > > > > >     },
> > > > > > > >     {
> > > > > > > >       "@id":
> > > > "urn:enhancement-de07bc41-e4a1-f510-3f93-99ebfd8c39f4",
> > > > > > > >       "@type": [
> > > > > > > >         "enhancer:Enhancement",
> > > > > > > >         "enhancer:EntityAnnotation"
> > > > > > > >       ],
> > > > > > > >       "dc:created": "2015-12-07T11:22:07.748Z",
> > > > > > > >       "dc:creator":
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> "org.apache.stanbol.enhancer.engines.entitytagging.impl.NamedEntityTaggingEngine",
> > > > > > > >       "dc:relation":
> > > > > > > > "urn:enhancement-8db13707-1ecd-b4df-90ad-52447c8f2c84",
> > > > > > > >       "enhancer:confidence": 0.1493264,
> > > > > > > >       "enhancer:entity-label": {
> > > > > > > >         "@language": "en",
> > > > > > > >         "@value": "Paris Commune"
> > > > > > > >       },
> > > > > > > >       "enhancer:entity-reference": "
> > > > > > > > http://dbpedia.org/resource/Paris_Commune",
> > > > > > > >       "enhancer:entity-type": [
> > > > > > > >         "dbp-ont:Country",
> > > > > > > >         "dbp-ont:Place",
> > > > > > > >         "dbp-ont:PopulatedPlace",
> > > > > > > >         "schema:Country",
> > > > > > > >         "schema:Place",
> > > > > > > >         "owl:Thing"
> > > > > > > >       ],
> > > > > > > >       "enhancer:extracted-from":
> > > > > > > >
> > "urn:content-item-sha1-c8ae372ed26679df14da13050dd432fd32c527e3",
> > > > > > > >       "entityhub:site": "dbpedia"
> > > > > > > >     },
> > > > > > > >     {
> > > > > > > >       "@id":
> > > > "urn:enhancement-e9c9c187-2d69-2c1f-6552-e76111430d4a",
> > > > > > > >       "@type": [
> > > > > > > >         "enhancer:Enhancement",
> > > > > > > >         "enhancer:TextAnnotation"
> > > > > > > >       ],
> > > > > > > >       "dc:created": "2015-12-07T11:22:07.740Z",
> > > > > > > >       "dc:creator":
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> "org.apache.stanbol.enhancer.engines.opennlp.impl.NamedEntityExtractionEnhancementEngine",
> > > > > > > >       "dc:type": "dbp-ont:Place",
> > > > > > > >       "enhancer:confidence": 0.99354976,
> > > > > > > >       "enhancer:end": 18,
> > > > > > > >       "enhancer:extracted-from":
> > > > > > > >
> > "urn:content-item-sha1-c8ae372ed26679df14da13050dd432fd32c527e3",
> > > > > > > >       "enhancer:selected-text": {
> > > > > > > >         "@language": "en",
> > > > > > > >         "@value": "France"
> > > > > > > >       },
> > > > > > > >       "enhancer:selection-context": {
> > > > > > > >         "@language": "en",
> > > > > > > >         "@value": "Paris is in France"
> > > > > > > >       },
> > > > > > > >       "enhancer:start": 12
> > > > > > > >     }
> > > > > > > >   ]
> > > > > > > > }
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > On Mon, Dec 7, 2015 at 4:23 PM, Karl Wright <
> > daddywri@gmail.com>
> > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi Dileepa,
> > > > > > > > >
> > > > > > > > > Repository connectors have an abstraction that allows them
> to
> > > > > > generate
> > > > > > > > > compound documents (where a document has a primary
> > identifier,
> > > > and
> > > > > > > there
> > > > > > > > > are subdocuments that share that primary identifier and
> have
> > a
> > > > > > > secondary
> > > > > > > > > identifier).  This sounds a bit like what you are
> describing.
> > > > Does
> > > > > > > > Stanbol
> > > > > > > > > work by decorating an existing document, or does it work by
> > > > > > generating
> > > > > > > > all
> > > > > > > > > content for a document?
> > > > > > > > >
> > > > > > > > > Karl
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Mon, Dec 7, 2015 at 5:12 AM, Dileepa Jayakody <
> > > > > > djayakody@zaizi.com>
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hi All,
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > While thanking you all for your input on Stanbol
> connector
> > > > > > > > requirement, I
> > > > > > > > > > would like to continue with modifying the Stanbol
> connector
> > > to
> > > > be
> > > > > > > > > > compatible with any output connector. If you guys can
> give
> > > some
> > > > > > > > guidance
> > > > > > > > > on
> > > > > > > > > > how the entity metadata should be added to the repository
> > > > > document
> > > > > > I
> > > > > > > > can
> > > > > > > > > > modify the stanbol connector accordingly.
> > > > > > > > > >
> > > > > > > > > > From Rafa's comments, I gathered we can add the entity
> > > metadata
> > > > > to
> > > > > > > the
> > > > > > > > > > repo.doc as key value pairs.
> > > > > > > > > > However this idea is not yet clear to me. There could be
> > 'N'
> > > > > number
> > > > > > > of
> > > > > > > > > > entities in a document and each of them will have some
> > common
> > > > > > > > attributes
> > > > > > > > > > such as name, id, type and specific attributes for
> > particular
> > > > > > entity
> > > > > > > > > type.
> > > > > > > > > > I'm not clear on how to maintain that structure of N
> number
> > > of
> > > > > > > entities
> > > > > > > > > > with their attributes in a repo.document as key value
> pairs
> > > and
> > > > > > make
> > > > > > > > them
> > > > > > > > > > LDPath compatible for retrieval in an output connector.
> > > > > > > > > >
> > > > > > > > > > @Rafa
> > > > > > > > > > If you can please elaborate on your suggestion it would
> be
> > > > > greatly
> > > > > > > > > helpful
> > > > > > > > > > to me.
> > > > > > > > > > All other suggestions are also welcome.
> > > > > > > > > >
> > > > > > > > > > Thanks,
> > > > > > > > > > Dileepa
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Fri, Nov 13, 2015 at 7:00 PM, Karl Wright <
> > > > daddywri@gmail.com
> > > > > >
> > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > I, too, agree.  Somebody will need to turn this
> connector
> > > > into
> > > > > > one
> > > > > > > > that
> > > > > > > > > > > plays by the rules.  It may be possible for someone on
> > the
> > > > team
> > > > > > > here
> > > > > > > > to
> > > > > > > > > > do
> > > > > > > > > > > that, but it won't be me; I'm seriously overextended at
> > the
> > > > > > moment.
> > > > > > > > It
> > > > > > > > > > > would be best if someone who knew the connector well
> > could
> > > do
> > > > > the
> > > > > > > > > > necessary
> > > > > > > > > > > work.
> > > > > > > > > > >
> > > > > > > > > > > Karl
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > On Fri, Nov 13, 2015 at 5:45 AM, Rafa Haro <
> > > > > > rharoapache@gmail.com>
> > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > I must agree with Antonio. When I started to work on
> > > this I
> > > > > was
> > > > > > > > > > expecting
> > > > > > > > > > > > the connector to work by just extracting the entities
> > and
> > > > > > > entities
> > > > > > > > > > > metadata
> > > > > > > > > > > > and put them as plain metadata of the documents,
> > probably
> > > > > > > following
> > > > > > > > > > > LDPATH
> > > > > > > > > > > > queries configuration
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > This is probably ok for Sensefy but I don’t think
> this
> > > > could
> > > > > be
> > > > > > > > > > suitable
> > > > > > > > > > > > to be included in the project. But this is only my
> > > opinion.
> > > > > Of
> > > > > > > > > course,
> > > > > > > > > > a
> > > > > > > > > > > > version of the connector that fully respect the
> > > ManifoldCF
> > > > > > > > > architecture
> > > > > > > > > > > > would be more than welcome in my opinion
> > > > > > > > > > > >
> > > > > > > > > > > > On Fri, Nov 13, 2015 at 11:38 AM, Antonio David Pérez
> > > > Morales
> > > > > > > > > > > > <adperezmorales@gmail.com> wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > Hi
> > > > > > > > > > > > > The removal of the SolrWrapper is a must. It was a
> > > > > > requirement
> > > > > > > > for
> > > > > > > > > an
> > > > > > > > > > > > > internal project which has nothing to do here with
> a
> > > > normal
> > > > > > > > > operation
> > > > > > > > > > > of
> > > > > > > > > > > > > Manifold, so forcing the users to use Solr does not
> > fit
> > > > the
> > > > > > > > > Manifold
> > > > > > > > > > > > > philosophy.
> > > > > > > > > > > > > In my opinion, at this moment, a Stanbol connector
> > with
> > > > > such
> > > > > > a
> > > > > > > > big
> > > > > > > > > > > > > dependency which will not fit almost any use case
> is
> > > not
> > > > > very
> > > > > > > > > useful.
> > > > > > > > > > > > > You should think a way to convert Stanbol connector
> > > into
> > > > a
> > > > > > > normal
> > > > > > > > > > > > > Transformation connector without assuming that a
> > > specific
> > > > > > > output
> > > > > > > > > > > > connector
> > > > > > > > > > > > > will be used.
> > > > > > > > > > > > > Regards
> > > > > > > > > > > > > 2015-11-13 11:20 GMT+01:00 Dileepa Jayakody <
> > > > > > > djayakody@zaizi.com
> > > > > > > > >:
> > > > > > > > > > > > >> Hi guys,
> > > > > > > > > > > > >>
> > > > > > > > > > > > >> I have developed a Stanbol connector for MCF. You
> > can
> > > > > check
> > > > > > it
> > > > > > > > out
> > > > > > > > > > > from
> > > > > > > > > > > > our
> > > > > > > > > > > > >> github repo here:
> > > > > > > > > > > > >>
> > > > > > > > > > > > >>
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/zaizi/sensefy-connectors/tree/master/transformation/mcf-stanbol-connector
> > > > > > > > > > > > >>
> > > > > > > > > > > > >> It requires the SolrWrapper output connector which
> > > > indexes
> > > > > > > > > enhanced
> > > > > > > > > > > > >> documents, entities and entityTypes in separate
> Solr
> > > > > cores.
> > > > > > > > > > Basically
> > > > > > > > > > > it
> > > > > > > > > > > > >> requires 3 separate solr cores configured with a
> > > > specific
> > > > > > Solr
> > > > > > > > > > schema
> > > > > > > > > > > > for
> > > > > > > > > > > > >> primary documents, entities and entityTypes
> > > separately.
> > > > > This
> > > > > > > was
> > > > > > > > > > done
> > > > > > > > > > > > for
> > > > > > > > > > > > >> our specific use-case.
> > > > > > > > > > > > >>
> > > > > > > > > > > > >> The SolrWrapper code is here :
> > > > > > > > > > > > >>
> > > > > > > > > > > > >>
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/zaizi/sensefy-connectors/tree/master/output/mcf-solrwrapperconnector
> > > > > > > > > > > > >>
> > > > > > > > > > > > >> Perhaps we can discuss and remove the Stanbol
> > > > connector's
> > > > > > > > > dependency
> > > > > > > > > > > > with
> > > > > > > > > > > > >> SolrWrapper and have it working with any output
> > > > connector.
> > > > > > > > > > > > >> Please note that the Stanbol connector currently
> > has a
> > > > bug
> > > > > > in
> > > > > > > > the
> > > > > > > > > UI
> > > > > > > > > > > > >> (editSpecification) which I'm working on at the
> > > moment.
> > > > > > After
> > > > > > > > > fixing
> > > > > > > > > > > > that I
> > > > > > > > > > > > >> will update here. And also I will provide
> > > documentations
> > > > > for
> > > > > > > > > > > configuring
> > > > > > > > > > > > >> the connector.
> > > > > > > > > > > > >>
> > > > > > > > > > > > >> Thanks,
> > > > > > > > > > > > >> Dileepa
> > > > > > > > > > > > >>
> > > > > > > > > > > > >> On Thu, Jul 9, 2015 at 8:36 PM, Antonio David
> Pérez
> > > > > Morales
> > > > > > <
> > > > > > > > > > > > >> adperezmorales@gmail.com> wrote:
> > > > > > > > > > > > >>
> > > > > > > > > > > > >> > Hi Joshua
> > > > > > > > > > > > >> >
> > > > > > > > > > > > >> > It is not the list for that, but Marmotta is
> > already
> > > > > > > > integrated
> > > > > > > > > in
> > > > > > > > > > > > Apache
> > > > > > > > > > > > >> > Stanbol. You can take a look at this issue
> > > > > > > > > > > > >> >
> > https://issues.apache.org/jira/browse/STANBOL-1165
> > > .
> > > > > > > > > > > > >> >
> > > > > > > > > > > > >> > Anyway, as I said this is not the list for that,
> > so
> > > > > let's
> > > > > > > use
> > > > > > > > > the
> > > > > > > > > > > > proper
> > > > > > > > > > > > >> > list for these things.
> > > > > > > > > > > > >> >
> > > > > > > > > > > > >> > Regards
> > > > > > > > > > > > >> >
> > > > > > > > > > > > >> >
> > > > > > > > > > > > >> >
> > > > > > > > > > > > >> > 2015-07-09 15:29 GMT+02:00 Joshua Dunham <
> > > > > > > > > joshua.dunham@gmail.com
> > > > > > > > > > >:
> > > > > > > > > > > > >> >
> > > > > > > > > > > > >> > > Hey Dileepa,
> > > > > > > > > > > > >> > >
> > > > > > > > > > > > >> > >       In case you were interested, I pinged
> the
> > > > list a
> > > > > > few
> > > > > > > > > days
> > > > > > > > > > > ago
> > > > > > > > > > > > >> > asking
> > > > > > > > > > > > >> > > for integration tips for Apache Marmotta.
> > > > > > > > > > > > >> > >
> > > > > > > > > > > > >> > > I got some great tips on how to do this which
> > > could
> > > > > help
> > > > > > > > you.
> > > > > > > > > > > Since
> > > > > > > > > > > > >> > > Marmotta is a drop in replacement for Clarezza
> > on
> > > > > > Stanbol
> > > > > > > it
> > > > > > > > > may
> > > > > > > > > > > be
> > > > > > > > > > > > >> > easier
> > > > > > > > > > > > >> > > for you to take this way.
> > > > > > > > > > > > >> > >
> > > > > > > > > > > > >> > > I'm not a Java programmer but I'm bringing
> this
> > > > > problem
> > > > > > to
> > > > > > > > the
> > > > > > > > > > > > >> > development
> > > > > > > > > > > > >> > > staff at my company for assistance. If you
> like
> > > the
> > > > > > > Marmotta
> > > > > > > > > > > > approach
> > > > > > > > > > > > >> we
> > > > > > > > > > > > >> > > may gain more traction solving the same
> > > integration.
> > > > > > > > > > > > >> > >
> > > > > > > > > > > > >> > > I'm also integrating Marmotta with Stanbol so
> > the
> > > > > effect
> > > > > > > > would
> > > > > > > > > > be
> > > > > > > > > > > > the
> > > > > > > > > > > > >> > same
> > > > > > > > > > > > >> > > except not using the Stanbol API for data
> import
> > > in
> > > > > > favor
> > > > > > > of
> > > > > > > > > > > > Marmotta.
> > > > > > > > > > > > >> > >
> > > > > > > > > > > > >> > > Best,
> > > > > > > > > > > > >> > >
> > > > > > > > > > > > >> > > -J
> > > > > > > > > > > > >> > >
> > > > > > > > > > > > >> > > > On Jul 9, 2015, at 1:03 AM, Dileepa
> Jayakody <
> > > > > > > > > > > djayakody@zaizi.com
> > > > > > > > > > > > >
> > > > > > > > > > > > >> > > wrote:
> > > > > > > > > > > > >> > > >
> > > > > > > > > > > > >> > > > Hi all,
> > > > > > > > > > > > >> > > >
> > > > > > > > > > > > >> > > > Thanks you for the feedback and offering
> your
> > > help
> > > > > in
> > > > > > > > this.
> > > > > > > > > > > > >> > > > Let me get back to you on where to start the
> > > code
> > > > > > base.
> > > > > > > > > > > > >> > > > As the first step, I would like to start by
> > > > > creating a
> > > > > > > > > > > > architecture
> > > > > > > > > > > > >> > > diagram
> > > > > > > > > > > > >> > > > for the connector.
> > > > > > > > > > > > >> > > > I will send the diagram for your review
> soon.
> > > > > > > > > > > > >> > > >
> > > > > > > > > > > > >> > > > Thanks,
> > > > > > > > > > > > >> > > > Dileepa
> > > > > > > > > > > > >> > > >
> > > > > > > > > > > > >> > > > --
> > > > > > > > > > > > >> > > >
> > > > > > > > > > > > >> > > > ------------------------------
> > > > > > > > > > > > >> > > > This message should be regarded as
> > confidential.
> > > > If
> > > > > > you
> > > > > > > > have
> > > > > > > > > > > > received
> > > > > > > > > > > > >> > > this
> > > > > > > > > > > > >> > > > email in error please notify the sender and
> > > > destroy
> > > > > it
> > > > > > > > > > > > immediately.
> > > > > > > > > > > > >> > > > Statements of intent shall only become
> binding
> > > > when
> > > > > > > > > confirmed
> > > > > > > > > > in
> > > > > > > > > > > > hard
> > > > > > > > > > > > >> > > copy
> > > > > > > > > > > > >> > > > by an authorised signatory.
> > > > > > > > > > > > >> > > >
> > > > > > > > > > > > >> > > > Zaizi Ltd is registered in England and Wales
> > > with
> > > > > the
> > > > > > > > > > > registration
> > > > > > > > > > > > >> > number
> > > > > > > > > > > > >> > > > 6440931. The Registered Office is Brook
> House,
> > > 229
> > > > > > > > Shepherds
> > > > > > > > > > > Bush
> > > > > > > > > > > > >> Road,
> > > > > > > > > > > > >> > > > London W6 7AN.
> > > > > > > > > > > > >> > >
> > > > > > > > > > > > >> >
> > > > > > > > > > > > >>
> > > > > > > > > > > > >> --
> > > > > > > > > > > > >>
> > > > > > > > > > > > >> ------------------------------
> > > > > > > > > > > > >> This message should be regarded as confidential.
> If
> > > you
> > > > > have
> > > > > > > > > > received
> > > > > > > > > > > > this
> > > > > > > > > > > > >> email in error please notify the sender and
> destroy
> > it
> > > > > > > > > immediately.
> > > > > > > > > > > > >> Statements of intent shall only become binding
> when
> > > > > > confirmed
> > > > > > > in
> > > > > > > > > > hard
> > > > > > > > > > > > copy
> > > > > > > > > > > > >> by an authorised signatory.
> > > > > > > > > > > > >>
> > > > > > > > > > > > >> Zaizi Ltd is registered in England and Wales with
> > the
> > > > > > > > registration
> > > > > > > > > > > > number
> > > > > > > > > > > > >> 6440931. The Registered Office is Brook House, 229
> > > > > Shepherds
> > > > > > > > Bush
> > > > > > > > > > > Road,
> > > > > > > > > > > > >> London W6 7AN.
> > > > > > > > > > > > >>
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > --
> > > > > > > > > >
> > > > > > > > > > ------------------------------
> > > > > > > > > > This message should be regarded as confidential. If you
> > have
> > > > > > received
> > > > > > > > > this
> > > > > > > > > > email in error please notify the sender and destroy it
> > > > > immediately.
> > > > > > > > > > Statements of intent shall only become binding when
> > confirmed
> > > > in
> > > > > > hard
> > > > > > > > > copy
> > > > > > > > > > by an authorised signatory.
> > > > > > > > > >
> > > > > > > > > > Zaizi Ltd is registered in England and Wales with the
> > > > > registration
> > > > > > > > number
> > > > > > > > > > 6440931. The Registered Office is Brook House, 229
> > Shepherds
> > > > Bush
> > > > > > > Road,
> > > > > > > > > > London W6 7AN.
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > >
> > > > > > > > ------------------------------
> > > > > > > > This message should be regarded as confidential. If you have
> > > > received
> > > > > > > this
> > > > > > > > email in error please notify the sender and destroy it
> > > immediately.
> > > > > > > > Statements of intent shall only become binding when confirmed
> > in
> > > > hard
> > > > > > > copy
> > > > > > > > by an authorised signatory.
> > > > > > > >
> > > > > > > > Zaizi Ltd is registered in England and Wales with the
> > > registration
> > > > > > number
> > > > > > > > 6440931. The Registered Office is Brook House, 229 Shepherds
> > Bush
> > > > > Road,
> > > > > > > > London W6 7AN.
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > > --
> > > > > >
> > > > > > ------------------------------
> > > > > > This message should be regarded as confidential. If you have
> > received
> > > > > this
> > > > > > email in error please notify the sender and destroy it
> immediately.
> > > > > > Statements of intent shall only become binding when confirmed in
> > hard
> > > > > copy
> > > > > > by an authorised signatory.
> > > > > >
> > > > > > Zaizi Ltd is registered in England and Wales with the
> registration
> > > > number
> > > > > > 6440931. The Registered Office is Brook House, 229 Shepherds Bush
> > > Road,
> > > > > > London W6 7AN.
> > > > > >
> > > > >
> > > >
> > > > --
> > > >
> > > > ------------------------------
> > > > This message should be regarded as confidential. If you have received
> > > this
> > > > email in error please notify the sender and destroy it immediately.
> > > > Statements of intent shall only become binding when confirmed in hard
> > > copy
> > > > by an authorised signatory.
> > > >
> > > > Zaizi Ltd is registered in England and Wales with the registration
> > number
> > > > 6440931. The Registered Office is Brook House, 229 Shepherds Bush
> Road,
> > > > London W6 7AN.
> > > >
> > >
> >
>
> --
>
> ------------------------------
> This message should be regarded as confidential. If you have received this
> email in error please notify the sender and destroy it immediately.
> Statements of intent shall only become binding when confirmed in hard copy
> by an authorised signatory.
>
> Zaizi Ltd is registered in England and Wales with the registration number
> 6440931. The Registered Office is Brook House, 229 Shepherds Bush Road,
> London W6 7AN.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message