Return-Path: X-Original-To: apmail-stanbol-dev-archive@www.apache.org Delivered-To: apmail-stanbol-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 7D52710890 for ; Sun, 1 Sep 2013 17:56:57 +0000 (UTC) Received: (qmail 69950 invoked by uid 500); 1 Sep 2013 17:56:56 -0000 Delivered-To: apmail-stanbol-dev-archive@stanbol.apache.org Received: (qmail 69742 invoked by uid 500); 1 Sep 2013 17:56:50 -0000 Mailing-List: contact dev-help@stanbol.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@stanbol.apache.org Delivered-To: mailing list dev@stanbol.apache.org Received: (qmail 69729 invoked by uid 99); 1 Sep 2013 17:56:48 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 01 Sep 2013 17:56:48 +0000 X-ASF-Spam-Status: No, hits=2.5 required=5.0 tests=FREEMAIL_REPLY,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of cristian.petroaca@gmail.com designates 209.85.223.182 as permitted sender) Received: from [209.85.223.182] (HELO mail-ie0-f182.google.com) (209.85.223.182) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 01 Sep 2013 17:56:34 +0000 Received: by mail-ie0-f182.google.com with SMTP id aq17so6789301iec.13 for ; Sun, 01 Sep 2013 10:56:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=/9xkdJaorQIoIihwhdJnWtQlFX8MOcoRBcXexpPqS3c=; b=vIvWI+x/Zcw/H0/XRQuyaHrKE1hXTM+u9wFHxR/2P8BryUA4IPC/FGGlWYYwi3Akf0 10K2wIRv7ZBoEDvD2XJGPV3ShNsN2tzYewH5QWXkUWb9Ozcc3xKE8FkfglEt7ylLjkHc WWwdXH5qHlKGcEkJbW9loYrFr/lfwYOcQ2bxYiw9tq8hM4F0YTuryN7l0xj6SoulB0Uh zb5qWFqfzMCbfIPEQpxFUap1yK/HQSEEGfpmMlFaI/KDvEtrF1fgnoQs5dar+1sr2ZqH UQUQbpNnw2GGhOEnK8H542vguQ1T3HY/eXwzEeySnLPONjXiMqdwX43Ig5rcr53mBQnG gEdA== MIME-Version: 1.0 X-Received: by 10.50.28.109 with SMTP id a13mr9546656igh.26.1378058173271; Sun, 01 Sep 2013 10:56:13 -0700 (PDT) Received: by 10.43.170.196 with HTTP; Sun, 1 Sep 2013 10:56:13 -0700 (PDT) In-Reply-To: References: <51B701C5.4060906@zaizi.com> Date: Sun, 1 Sep 2013 20:56:13 +0300 Message-ID: Subject: Re: Relation extraction feature From: Cristian Petroaca To: dev@stanbol.apache.org Content-Type: multipart/alternative; boundary=089e01538040e5d36e04e5562b65 X-Virus-Checked: Checked by ClamAV on apache.org --089e01538040e5d36e04e5562b65 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Related to the Stanford Dependency Tree Feature, this is the way the output from the tool looks like for this sentence : "Mary and Tom met Danny today" : 2013/8/30 Cristian Petroaca > Hi Rupert, > > Ok, so after looking at the JSON output from the Stanford NLP Server and > the coref module I'm thinking I can represent the coreference information > this way: > Each "Token" or "Chunk" will contain an additional coref annotation with > the following structure : > > "stanbol.enhancer.nlp.coref" { > "tag" : //does this need to exist? > "isRepresentative" : true/false, // whether this token or chunk is th= e > representative mention in the chain > "mentions" : [ { "sentenceNo" : 1 //the sentence in which the mention > is found > "startWord" : 2 //the first word making up the > mention > "endWord" : 3 //the last word making up the > mention > }, ... > ], > "class" : ""class" : "org.apache.stanbol.enhancer.nlp.coref.CorefTag" > } > > The CorefTag should resemble this model. > > What do you think? > > Cristian > > > 2013/8/24 Rupert Westenthaler > >> Hi Cristian, >> >> you can not directly call StanfordNLP components from Stanbol, but you >> have to extend the RESTful service to include the information you >> need. The main reason for that is that the license of StanfordNLP is >> not compatible with the Apache Software License. So Stanbol can not >> directly link to the StanfordNLP API. >> >> You will need to >> >> 1. define an additional class {yourTag} extends Tag<{yourType}> class >> in the o.a.s.enhancer.nlp module >> 2. add JSON parsing and serialization support for this tag to the >> o.a.s.enhancer.nlp.json module (see e.g. PosTagSupport as an example) >> >> As (1) would be necessary anyway the only additional thing you need to >> develop is (2). After that you can add {yourTag} instance to the >> AnalyzedText in the StanfornNLP integration. The >> RestfulNlpAnalysisEngine will parse them from the response. All >> engines executed after the RestfulNlpAnalysisEngine will have access >> to your annotations. >> >> If you have a design for {yourTag} - the model you would like to use >> to represent your data - I can help with (1) and (2). >> >> best >> Rupert >> >> >> On Fri, Aug 23, 2013 at 5:11 PM, Cristian Petroaca >> wrote: >> > Hi Rupert, >> > >> > Thanks for the info. Looking at the standbol-stanfordnlp project I see >> that >> > the stanford nlp is not implemented as an EnhancementEngine but rather >> it >> > is used directly in a Jetty Server instance. How does that fit into th= e >> > Stanbol stack? For example how can I call the StanfordNlpAnalyzer's >> routine >> > from my TripleExtractionEnhancementEngine which lives in the Stanbol >> stack? >> > >> > Thanks, >> > Cristian >> > >> > >> > 2013/8/12 Rupert Westenthaler >> > >> >> Hi Cristian, >> >> >> >> Sorry for the late response, but I was offline for the last two weeks >> >> >> >> On Fri, Aug 2, 2013 at 9:19 PM, Cristian Petroaca >> >> wrote: >> >> > Hi Rupert, >> >> > >> >> > After doing some tests it seems that the Stanford NLP coreference >> module >> >> is >> >> > much more accurate than the Open NLP one.So I decided to extend >> Stanford >> >> > NLP to add coreference there. >> >> >> >> The Stanford NLP integration is not part of the Stanbol codebase >> >> because the licenses are not compatible. >> >> >> >> You can find the Stanford NLP integration on >> >> >> >> https://github.com/westei/stanbol-stanfordnlp >> >> >> >> just create a fork and send pull requests. >> >> >> >> >> >> > Could you add the necessary projects on the branch? And also remove >> the >> >> > Open NLP ones? >> >> > >> >> >> >> Currently the branch >> >> >> >> >> >> >> http://svn.apache.org/repos/asf/stanbol/branches/nlp-dep-tree-and-co-ref= / >> >> >> >> only contains the "nlp" and the "nlp-json" modules. IMO those should >> >> be enough for adding coreference support. >> >> >> >> IMO you will need to >> >> >> >> * add an model for representing coreference to the nlp module >> >> * add parsing and serializing support to the nlp-json module >> >> * add the implementation to your fork of the stanbol-stanfordnlp >> project >> >> >> >> best >> >> Rupert >> >> >> >> >> >> >> >> > Thanks, >> >> > Cristian >> >> > >> >> > >> >> > 2013/7/5 Rupert Westenthaler >> >> > >> >> >> Hi Cristian, >> >> >> >> >> >> I created the branch at >> >> >> >> >> >> >> >> >> >> >> >> http://svn.apache.org/repos/asf/stanbol/branches/nlp-dep-tree-and-co-ref= / >> >> >> >> >> >> ATM in contains only the "nlp" and "nlp-json" module. Let me know = if >> >> >> you would like to have more >> >> >> >> >> >> best >> >> >> Rupert >> >> >> >> >> >> >> >> >> >> >> >> On Thu, Jul 4, 2013 at 10:14 AM, Cristian Petroaca >> >> >> wrote: >> >> >> > Hi Rupert, >> >> >> > >> >> >> > I created jiras : >> https://issues.apache.org/jira/browse/STANBOL-1132and >> >> >> > https://issues.apache.org/jira/browse/STANBOL-1133. The original >> one >> >> in >> >> >> > dependent upon these. >> >> >> > Please let me know when I can start using the branch. >> >> >> > >> >> >> > Thanks, >> >> >> > Cristian >> >> >> > >> >> >> > >> >> >> > 2013/6/27 Cristian Petroaca >> >> >> > >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> 2013/6/27 Rupert Westenthaler >> >> >> >> >> >> >> >>> On Thu, Jun 27, 2013 at 3:12 PM, Cristian Petroaca >> >> >> >>> wrote: >> >> >> >>> > Sorry, I meant the Stanbol NLP API, not Stanford in my >> previous >> >> >> e-mail. >> >> >> >>> By >> >> >> >>> > the way, does Open NLP have the ability to build dependency >> trees? >> >> >> >>> > >> >> >> >>> >> >> >> >>> AFAIK OpenNLP does not provide this feature. >> >> >> >>> >> >> >> >> >> >> >> >> Then , since the Stanford NLP lib is also integrated into >> Stanbol, >> >> I'll >> >> >> >> take a look at how I can extend its integration to include the >> >> >> dependency >> >> >> >> tree feature. >> >> >> >> >> >> >> >>> >> >> >> >>> >> >> >> >> > >> >> >> >>> > 2013/6/23 Cristian Petroaca >> >> >> >>> > >> >> >> >>> >> Hi Rupert, >> >> >> >>> >> >> >> >> >>> >> I created jira >> >> https://issues.apache.org/jira/browse/STANBOL-1121. >> >> >> >>> >> As you suggested I would start with extending the Stanford >> NLP >> >> with >> >> >> >>> >> co-reference resolution but I think also with dependency >> trees >> >> >> because >> >> >> >>> I >> >> >> >>> >> also need to know the Subject of the sentence and the objec= t >> >> that it >> >> >> >>> >> affects, right? >> >> >> >>> >> >> >> >> >>> >> Given that I need to extend the Stanford NLP API in Stanbol >> for >> >> >> >>> >> co-reference and dependency trees, how do I proceed with >> this? >> >> Do I >> >> >> >>> create >> >> >> >>> >> 2 new sub-tasks to the already opened Jira? After that can = I >> >> start >> >> >> >>> >> implementing on my local copy of Stanbol and when I'm done >> I'll >> >> send >> >> >> >>> you >> >> >> >>> >> guys the patch fo review? >> >> >> >>> >> >> >> >> >>> >> >> >> >>> I would create two "New Feature" type Issues one for adding >> support >> >> >> >>> for "dependency trees" and the other for "co-reference" >> support. You >> >> >> >>> should also define "depends on" relations between STANBOL-1121 >> and >> >> >> >>> those two new issues. >> >> >> >>> >> >> >> >>> Sub-task could also work, but as adding those features would b= e >> also >> >> >> >>> interesting for other things I would rather define them as >> separate >> >> >> >>> issues. >> >> >> >>> >> >> >> >>> >> >> >> >> 2 New Features connected with the original jira it is then. >> >> >> >> >> >> >> >> >> >> >> >>> If you would prefer to work in an own branch please tell me. >> This >> >> >> >>> could have the advantage that patches would not be affected by >> >> changes >> >> >> >>> in the trunk. >> >> >> >>> >> >> >> >>> Yes, a separate branch sounds good. >> >> >> >> >> >> >> >> best >> >> >> >>> Rupert >> >> >> >>> >> >> >> >>> >> Regards, >> >> >> >>> >> Cristian >> >> >> >>> >> >> >> >> >>> >> >> >> >> >>> >> 2013/6/18 Rupert Westenthaler > > >> >> >> >>> >> >> >> >> >>> >>> On Mon, Jun 17, 2013 at 10:18 PM, Cristian Petroaca >> >> >> >>> >>> wrote: >> >> >> >>> >>> > Hi Rupert, >> >> >> >>> >>> > >> >> >> >>> >>> > Agreed on the >> >> >> >>> SettingAnnotation/ParticipantAnnotation/OccurentAnnotation >> >> >> >>> >>> > data structure. >> >> >> >>> >>> > >> >> >> >>> >>> > Should I open up a Jira for all of this in order to >> >> encapsulate >> >> >> this >> >> >> >>> >>> > information and establish the goals and these initial >> steps >> >> >> towards >> >> >> >>> >>> these >> >> >> >>> >>> > goals? >> >> >> >>> >>> >> >> >> >>> >>> Yes please. A JIRA issue for this work would be great. >> >> >> >>> >>> >> >> >> >>> >>> > How should I proceed further? Should I create some desig= n >> >> >> documents >> >> >> >>> that >> >> >> >>> >>> > need to be reviewed? >> >> >> >>> >>> >> >> >> >>> >>> Usually it is the best to write design related text >> directly in >> >> >> JIRA >> >> >> >>> >>> by using Markdown [1] syntax. This will allow us later to >> use >> >> this >> >> >> >>> >>> text directly for the documentation on the Stanbol Webpage= . >> >> >> >>> >>> >> >> >> >>> >>> best >> >> >> >>> >>> Rupert >> >> >> >>> >>> >> >> >> >>> >>> >> >> >> >>> >>> [1] http://daringfireball.net/projects/markdown/ >> >> >> >>> >>> > >> >> >> >>> >>> > Regards, >> >> >> >>> >>> > Cristian >> >> >> >>> >>> > >> >> >> >>> >>> > >> >> >> >>> >>> > 2013/6/17 Rupert Westenthaler < >> rupert.westenthaler@gmail.com> >> >> >> >>> >>> > >> >> >> >>> >>> >> On Thu, Jun 13, 2013 at 8:22 PM, Cristian Petroaca >> >> >> >>> >>> >> wrote: >> >> >> >>> >>> >> > HI Rupert, >> >> >> >>> >>> >> > >> >> >> >>> >>> >> > First of all thanks for the detailed suggestions. >> >> >> >>> >>> >> > >> >> >> >>> >>> >> > 2013/6/12 Rupert Westenthaler < >> >> rupert.westenthaler@gmail.com> >> >> >> >>> >>> >> > >> >> >> >>> >>> >> >> Hi Cristian, all >> >> >> >>> >>> >> >> >> >> >> >>> >>> >> >> really interesting use case! >> >> >> >>> >>> >> >> >> >> >> >>> >>> >> >> In this mail I will try to give some suggestions on >> how >> >> this >> >> >> >>> could >> >> >> >>> >>> >> >> work out. This suggestions are mainly based on >> experiences >> >> >> and >> >> >> >>> >>> lessons >> >> >> >>> >>> >> >> learned in the LIVE [2] project where we built an >> >> information >> >> >> >>> system >> >> >> >>> >>> >> >> for the Olympic Games in Peking. While this Project >> >> excluded >> >> >> the >> >> >> >>> >>> >> >> extraction of Events from unstructured text (because >> the >> >> >> Olympic >> >> >> >>> >>> >> >> Information System was already providing event data >> as XML >> >> >> >>> messages) >> >> >> >>> >>> >> >> the semantic search capabilities of this system wher= e >> very >> >> >> >>> similar >> >> >> >>> >>> as >> >> >> >>> >>> >> >> the one described by your use case. >> >> >> >>> >>> >> >> >> >> >> >>> >>> >> >> IMHO you are not only trying to extract relations, >> but a >> >> >> formal >> >> >> >>> >>> >> >> representation of the situation described by the >> text. So >> >> >> lets >> >> >> >>> >>> assume >> >> >> >>> >>> >> >> that the goal is to Annotate a Setting (or Situation= ) >> >> >> described >> >> >> >>> in >> >> >> >>> >>> the >> >> >> >>> >>> >> >> text - a fise:SettingAnnotation. >> >> >> >>> >>> >> >> >> >> >> >>> >>> >> >> The DOLCE foundational ontology [1] gives some >> advices on >> >> >> how to >> >> >> >>> >>> model >> >> >> >>> >>> >> >> those. The important relation for modeling this >> >> >> Participation: >> >> >> >>> >>> >> >> >> >> >> >>> >>> >> >> PC(x, y, t) =E2=86=92 (ED(x) =E2=88=A7 PD(y) =E2= =88=A7 T(t)) >> >> >> >>> >>> >> >> >> >> >> >>> >>> >> >> where .. >> >> >> >>> >>> >> >> >> >> >> >>> >>> >> >> * ED are Endurants (continuants): Endurants do have >> an >> >> >> >>> identity so >> >> >> >>> >>> we >> >> >> >>> >>> >> >> would typically refer to them as Entities referenced >> by a >> >> >> >>> setting. >> >> >> >>> >>> >> >> Note that this includes physical, non-physical as >> well as >> >> >> >>> >>> >> >> social-objects. >> >> >> >>> >>> >> >> * PD are Perdurants (occurrents): Perdurants are >> >> entities >> >> >> that >> >> >> >>> >>> >> >> happen in time. This refers to Events, Activities ..= . >> >> >> >>> >>> >> >> * PC are Participation: It is an time indexed >> relation >> >> where >> >> >> >>> >>> >> >> Endurants participate in Perdurants >> >> >> >>> >>> >> >> >> >> >> >>> >>> >> >> Modeling this in RDF requires to define some >> intermediate >> >> >> >>> resources >> >> >> >>> >>> >> >> because RDF does not allow for n-ary relations. >> >> >> >>> >>> >> >> >> >> >> >>> >>> >> >> * fise:SettingAnnotation: It is really handy to >> define >> >> one >> >> >> >>> resource >> >> >> >>> >>> >> >> being the context for all described data. I would ca= ll >> >> this >> >> >> >>> >>> >> >> "fise:SettingAnnotation" and define it as a >> sub-concept to >> >> >> >>> >>> >> >> fise:Enhancement. All further enhancement about the >> >> extracted >> >> >> >>> >>> Setting >> >> >> >>> >>> >> >> would define a "fise:in-setting" relation to it. >> >> >> >>> >>> >> >> >> >> >> >>> >>> >> >> * fise:ParticipantAnnotation: Is used to annotate >> that >> >> >> >>> Endurant is >> >> >> >>> >>> >> >> participating on a setting (fise:in-setting >> >> >> >>> fise:SettingAnnotation). >> >> >> >>> >>> >> >> The Endurant itself is described by existing >> >> >> fise:TextAnnotaion >> >> >> >>> (the >> >> >> >>> >>> >> >> mentions) and fise:EntityAnnotation (suggested >> Entities). >> >> >> >>> Basically >> >> >> >>> >>> >> >> the fise:ParticipantAnnotation will allow an >> >> >> EnhancementEngine >> >> >> >>> to >> >> >> >>> >>> >> >> state that several mentions (in possible different >> >> >> sentences) do >> >> >> >>> >>> >> >> represent the same Endurant as participating in the >> >> Setting. >> >> >> In >> >> >> >>> >>> >> >> addition it would be possible to use the dc:type >> property >> >> >> >>> (similar >> >> >> >>> >>> as >> >> >> >>> >>> >> >> for fise:TextAnnotation) to refer to the role(s) of = an >> >> >> >>> participant >> >> >> >>> >>> >> >> (e.g. the set: Agent (intensionally performs an >> action) >> >> Cause >> >> >> >>> >>> >> >> (unintentionally e.g. a mud slide), Patient (a passi= ve >> >> role >> >> >> in >> >> >> >>> an >> >> >> >>> >>> >> >> activity) and Instrument (aids an process)), but I a= m >> >> >> wondering >> >> >> >>> if >> >> >> >>> >>> one >> >> >> >>> >>> >> >> could extract those information. >> >> >> >>> >>> >> >> >> >> >> >>> >>> >> >> * fise:OccurrentAnnotation: is used to annotate a >> >> Perdurant >> >> >> in >> >> >> >>> the >> >> >> >>> >>> >> >> context of the Setting. Also fise:OccurrentAnnotatio= n >> can >> >> >> link >> >> >> >>> to >> >> >> >>> >>> >> >> fise:TextAnnotaion (typically verbs in the text >> defining >> >> the >> >> >> >>> >>> >> >> perdurant) as well as fise:EntityAnnotation suggesti= ng >> >> well >> >> >> >>> known >> >> >> >>> >>> >> >> Events in a knowledge base (e.g. a Election in a >> country, >> >> or >> >> >> an >> >> >> >>> >>> >> >> upraising ...). In addition fise:OccurrentAnnotation >> can >> >> >> define >> >> >> >>> >>> >> >> dc:has-participant links to >> fise:ParticipantAnnotation. In >> >> >> this >> >> >> >>> case >> >> >> >>> >>> >> >> it is explicitly stated hat an Endurant (the >> >> >> >>> >>> >> >> fise:ParticipantAnnotation) involved in this Pertura= nt >> >> (the >> >> >> >>> >>> >> >> fise:OccurrentAnnotation). As Occurrences are tempor= al >> >> >> indexed >> >> >> >>> this >> >> >> >>> >>> >> >> annotation should also support properties for >> defining the >> >> >> >>> >>> >> >> xsd:dateTime for the start/end. >> >> >> >>> >>> >> >> >> >> >> >>> >>> >> >> >> >> >> >>> >>> >> >> Indeed, an event based data structure makes a lot of >> sense >> >> >> with >> >> >> >>> the >> >> >> >>> >>> >> remark >> >> >> >>> >>> >> > that you probably won't be able to always extract the >> date >> >> >> for a >> >> >> >>> >>> given >> >> >> >>> >>> >> > setting(situation). >> >> >> >>> >>> >> > There are 2 thing which are unclear though. >> >> >> >>> >>> >> > >> >> >> >>> >>> >> > 1. Perdurant : You could have situations in which the >> >> object >> >> >> upon >> >> >> >>> >>> which >> >> >> >>> >>> >> the >> >> >> >>> >>> >> > Subject ( or Endurant ) is acting is not a transitory >> >> object ( >> >> >> >>> such >> >> >> >>> >>> as an >> >> >> >>> >>> >> > event, activity ) but rather another Endurant. For >> example >> >> we >> >> >> can >> >> >> >>> >>> have >> >> >> >>> >>> >> the >> >> >> >>> >>> >> > phrase "USA invades Irak" where "USA" is the Endurant= ( >> >> >> Subject ) >> >> >> >>> >>> which >> >> >> >>> >>> >> > performs the action of "invading" on another Eunduran= t, >> >> namely >> >> >> >>> >>> "Irak". >> >> >> >>> >>> >> > >> >> >> >>> >>> >> >> >> >> >>> >>> >> By using CAOS, USA would be the Agent and Iraq the >> Patient. >> >> Both >> >> >> >>> are >> >> >> >>> >>> >> Endurants. The activity "invading" would be the >> Perdurant. So >> >> >> >>> ideally >> >> >> >>> >>> >> you would have a "fise:SettingAnnotation" with: >> >> >> >>> >>> >> >> >> >> >>> >>> >> * fise:ParticipantAnnotation for USA with the dc:type >> >> >> caos:Agent, >> >> >> >>> >>> >> linking to a fise:TextAnnotation for "USA" and a >> >> >> >>> fise:EntityAnnotation >> >> >> >>> >>> >> linking to dbpedia:United_States >> >> >> >>> >>> >> * fise:ParticipantAnnotation for Iraq with the dc:typ= e >> >> >> >>> caos:Patient, >> >> >> >>> >>> >> linking to a fise:TextAnnotation for "Irak" and a >> >> >> >>> >>> >> fise:EntityAnnotation linking to dbpedia:Iraq >> >> >> >>> >>> >> * fise:OccurrentAnnotation for "invades" with the >> dc:type >> >> >> >>> >>> >> caos:Activity, linking to a fise:TextAnnotation for >> "invades" >> >> >> >>> >>> >> >> >> >> >>> >>> >> > 2. Where does the verb, which links the Subject and t= he >> >> Object >> >> >> >>> come >> >> >> >>> >>> into >> >> >> >>> >>> >> > this? I imagined that the Endurant would have a >> >> dc:"property" >> >> >> >>> where >> >> >> >>> >>> the >> >> >> >>> >>> >> > property =3D verb which links to the Object in noun >> form. For >> >> >> >>> example >> >> >> >>> >>> take >> >> >> >>> >>> >> > again the sentence "USA invades Irak". You would have >> the >> >> >> "USA" >> >> >> >>> >>> Entity >> >> >> >>> >>> >> with >> >> >> >>> >>> >> > dc:invader which points to the Object "Irak". The >> Endurant >> >> >> would >> >> >> >>> >>> have as >> >> >> >>> >>> >> > many dc:"property" elements as there are verbs which >> link >> >> it >> >> >> to >> >> >> >>> an >> >> >> >>> >>> >> Object. >> >> >> >>> >>> >> >> >> >> >>> >>> >> As explained above you would have a >> fise:OccurrentAnnotation >> >> >> that >> >> >> >>> >>> >> represents the Perdurant. The information that the >> activity >> >> >> >>> mention in >> >> >> >>> >>> >> the text is "invades" would be by linking to a >> >> >> >>> fise:TextAnnotation. If >> >> >> >>> >>> >> you can also provide an Ontology for Tasks that defines >> >> >> >>> >>> >> "myTasks:invade" the fise:OccurrentAnnotation could als= o >> link >> >> >> to an >> >> >> >>> >>> >> fise:EntityAnnotation for this concept. >> >> >> >>> >>> >> >> >> >> >>> >>> >> best >> >> >> >>> >>> >> Rupert >> >> >> >>> >>> >> >> >> >> >>> >>> >> > >> >> >> >>> >>> >> > ### Consuming the data: >> >> >> >>> >>> >> >> >> >> >> >>> >>> >> >> I think this model should be sufficient for use-case= s >> as >> >> >> >>> described >> >> >> >>> >>> by >> >> >> >>> >>> >> you. >> >> >> >>> >>> >> >> >> >> >> >>> >>> >> >> Users would be able to consume data on the setting >> level. >> >> >> This >> >> >> >>> can >> >> >> >>> >>> be >> >> >> >>> >>> >> >> done my simple retrieving all >> fise:ParticipantAnnotation >> >> as >> >> >> >>> well as >> >> >> >>> >>> >> >> fise:OccurrentAnnotation linked with a setting. BTW >> this >> >> was >> >> >> the >> >> >> >>> >>> >> >> approach used in LIVE [2] for semantic search. It >> allows >> >> >> >>> queries for >> >> >> >>> >>> >> >> Settings that involve specific Entities e.g. you cou= ld >> >> filter >> >> >> >>> for >> >> >> >>> >>> >> >> Settings that involve a {Person}, activities:Arreste= d >> and >> >> a >> >> >> >>> specific >> >> >> >>> >>> >> >> {Upraising}. However note that with this approach yo= u >> will >> >> >> get >> >> >> >>> >>> results >> >> >> >>> >>> >> >> for Setting where the {Person} participated and an >> other >> >> >> person >> >> >> >>> was >> >> >> >>> >>> >> >> arrested. >> >> >> >>> >>> >> >> >> >> >> >>> >>> >> >> An other possibility would be to process enhancement >> >> results >> >> >> on >> >> >> >>> the >> >> >> >>> >>> >> >> fise:OccurrentAnnotation. This would allow to a much >> >> higher >> >> >> >>> >>> >> >> granularity level (e.g. it would allow to correctly >> answer >> >> >> the >> >> >> >>> query >> >> >> >>> >>> >> >> used as an example above). But I am wondering if the >> >> quality >> >> >> of >> >> >> >>> the >> >> >> >>> >>> >> >> Setting extraction will be sufficient for this. I ha= ve >> >> also >> >> >> >>> doubts >> >> >> >>> >>> if >> >> >> >>> >>> >> >> this can be still realized by using semantic indexin= g >> to >> >> >> Apache >> >> >> >>> Solr >> >> >> >>> >>> >> >> or if it would be better/necessary to store results >> in a >> >> >> >>> TripleStore >> >> >> >>> >>> >> >> and using SPARQL for retrieval. >> >> >> >>> >>> >> >> >> >> >> >>> >>> >> >> The methodology and query language used by YAGO [3] = is >> >> also >> >> >> very >> >> >> >>> >>> >> >> relevant for this (especially note chapter 7 SPOTL(X= ) >> >> >> >>> >>> Representation). >> >> >> >>> >>> >> >> >> >> >> >>> >>> >> >> An other related Topic is the enrichment of Entities >> >> >> (especially >> >> >> >>> >>> >> >> Events) in knowledge bases based on Settings extract= ed >> >> form >> >> >> >>> >>> Documents. >> >> >> >>> >>> >> >> As per definition - in DOLCE - Perdurants are tempor= al >> >> >> indexed. >> >> >> >>> That >> >> >> >>> >>> >> >> means that at the time when added to a knowledge bas= e >> they >> >> >> might >> >> >> >>> >>> still >> >> >> >>> >>> >> >> be in process. So the creation, enriching and >> refinement >> >> of >> >> >> such >> >> >> >>> >>> >> >> Entities in a the knowledge base seams to be critica= l >> for >> >> a >> >> >> >>> System >> >> >> >>> >>> >> >> like described in your use-case. >> >> >> >>> >>> >> >> >> >> >> >>> >>> >> >> On Tue, Jun 11, 2013 at 9:09 PM, Cristian Petroaca >> >> >> >>> >>> >> >> wrote: >> >> >> >>> >>> >> >> > >> >> >> >>> >>> >> >> > First of all I have to mention that I am new in th= e >> >> field >> >> >> of >> >> >> >>> >>> semantic >> >> >> >>> >>> >> >> > technologies, I've started to read about them in t= he >> >> last >> >> >> 4-5 >> >> >> >>> >>> >> >> months.Having >> >> >> >>> >>> >> >> > said that I have a high level overview of what is = a >> good >> >> >> >>> approach >> >> >> >>> >>> to >> >> >> >>> >>> >> >> solve >> >> >> >>> >>> >> >> > this problem. There are a number of papers on the >> >> internet >> >> >> >>> which >> >> >> >>> >>> >> describe >> >> >> >>> >>> >> >> > what steps need to be taken such as : named entity >> >> >> >>> recognition, >> >> >> >>> >>> >> >> > co-reference resolution, pos tagging and others. >> >> >> >>> >>> >> >> >> >> >> >>> >>> >> >> The Stanbol NLP processing module currently only >> supports >> >> >> >>> sentence >> >> >> >>> >>> >> >> detection, tokenization, POS tagging, Chunking, NER >> and >> >> >> lemma. >> >> >> >>> >>> support >> >> >> >>> >>> >> >> for co-reference resolution and dependency trees is >> >> currently >> >> >> >>> >>> missing. >> >> >> >>> >>> >> >> >> >> >> >>> >>> >> >> Stanford NLP is already integrated with Stanbol [4]. >> At >> >> the >> >> >> >>> moment >> >> >> >>> >>> it >> >> >> >>> >>> >> >> only supports English, but I do already work to >> include >> >> the >> >> >> >>> other >> >> >> >>> >>> >> >> supported languages. Other NLP framework that is >> already >> >> >> >>> integrated >> >> >> >>> >>> >> >> with Stanbol are Freeling [5] and Talismane [6]. But >> note >> >> >> that >> >> >> >>> for >> >> >> >>> >>> all >> >> >> >>> >>> >> >> those the integration excludes support for >> co-reference >> >> and >> >> >> >>> >>> dependency >> >> >> >>> >>> >> >> trees. >> >> >> >>> >>> >> >> >> >> >> >>> >>> >> >> Anyways I am confident that one can implement a firs= t >> >> >> prototype >> >> >> >>> by >> >> >> >>> >>> >> >> only using Sentences and POS tags and - if available= - >> >> Chunks >> >> >> >>> (e.g. >> >> >> >>> >>> >> >> Noun phrases). >> >> >> >>> >>> >> >> >> >> >> >>> >>> >> >> >> >> >> >>> >>> >> > I assume that in the Stanbol context, a feature like >> >> Relation >> >> >> >>> >>> extraction >> >> >> >>> >>> >> > would be implemented as an EnhancementEngine? >> >> >> >>> >>> >> > What kind of effort would be required for a >> co-reference >> >> >> >>> resolution >> >> >> >>> >>> tool >> >> >> >>> >>> >> > integration into Stanbol? >> >> >> >>> >>> >> > >> >> >> >>> >>> >> >> >> >> >>> >>> >> Yes in the end it would be an EnhancementEngine. But >> before >> >> we >> >> >> can >> >> >> >>> >>> >> build such an engine we would need to >> >> >> >>> >>> >> >> >> >> >>> >>> >> * extend the Stanbol NLP processing API with Annotation= s >> for >> >> >> >>> >>> co-reference >> >> >> >>> >>> >> * add support for JSON Serialisation/Parsing for those >> >> >> annotation >> >> >> >>> so >> >> >> >>> >>> >> that the RESTful NLP Analysis Service can provide >> >> co-reference >> >> >> >>> >>> >> information >> >> >> >>> >>> >> >> >> >> >>> >>> >> > At this moment I'll be focusing on 2 aspects: >> >> >> >>> >>> >> > >> >> >> >>> >>> >> > 1. Determine the best data structure to encapsulate t= he >> >> >> extracted >> >> >> >>> >>> >> > information. I'll take a closer look at Dolce. >> >> >> >>> >>> >> >> >> >> >>> >>> >> Don't make to to complex. Defining a proper structure t= o >> >> >> represent >> >> >> >>> >>> >> Events will only pay-off if we can also successfully >> extract >> >> >> such >> >> >> >>> >>> >> information form processed texts. >> >> >> >>> >>> >> >> >> >> >>> >>> >> I would start with >> >> >> >>> >>> >> >> >> >> >>> >>> >> * fise:SettingAnnotation >> >> >> >>> >>> >> * {fise:Enhancement} metadata >> >> >> >>> >>> >> >> >> >> >>> >>> >> * fise:ParticipantAnnotation >> >> >> >>> >>> >> * {fise:Enhancement} metadata >> >> >> >>> >>> >> * fise:inSetting {settingAnnotation} >> >> >> >>> >>> >> * fise:hasMention {textAnnotation} >> >> >> >>> >>> >> * fise:suggestion {entityAnnotation} (multiple if >> there >> >> are >> >> >> >>> more >> >> >> >>> >>> >> suggestions) >> >> >> >>> >>> >> * dc:type one of fise:Agent, fise:Patient, >> >> fise:Instrument, >> >> >> >>> >>> fise:Cause >> >> >> >>> >>> >> >> >> >> >>> >>> >> * fise:OccurrentAnnotation >> >> >> >>> >>> >> * {fise:Enhancement} metadata >> >> >> >>> >>> >> * fise:inSetting {settingAnnotation} >> >> >> >>> >>> >> * fise:hasMention {textAnnotation} >> >> >> >>> >>> >> * dc:type set to fise:Activity >> >> >> >>> >>> >> >> >> >> >>> >>> >> If it turns out that we can extract more, we can add mo= re >> >> >> >>> structure to >> >> >> >>> >>> >> those annotations. We might also think about using an o= wn >> >> >> namespace >> >> >> >>> >>> >> for those extensions to the annotation structure. >> >> >> >>> >>> >> >> >> >> >>> >>> >> > 2. Determine how should all of this be integrated int= o >> >> >> Stanbol. >> >> >> >>> >>> >> >> >> >> >>> >>> >> Just create an EventExtractionEngine and configure a >> >> enhancement >> >> >> >>> chain >> >> >> >>> >>> >> that does NLP processing and EntityLinking. >> >> >> >>> >>> >> >> >> >> >>> >>> >> You should have a look at >> >> >> >>> >>> >> >> >> >> >>> >>> >> * SentimentSummarizationEngine [1] as it does a lot of >> things >> >> >> with >> >> >> >>> NLP >> >> >> >>> >>> >> processing results (e.g. connecting adjectives (via >> verbs) to >> >> >> >>> >>> >> nouns/pronouns. So as long we can not use explicit >> dependency >> >> >> trees >> >> >> >>> >>> >> you code will need to do similar things with Nouns, >> Pronouns >> >> and >> >> >> >>> >>> >> Verbs. >> >> >> >>> >>> >> >> >> >> >>> >>> >> * Disambigutation-MLT engine, as it creates a Java >> >> >> representation >> >> >> >>> of >> >> >> >>> >>> >> present fise:TextAnnotation and fise:EntityAnnotation >> [2]. >> >> >> >>> Something >> >> >> >>> >>> >> similar will also be required by the >> EventExtractionEngine >> >> for >> >> >> fast >> >> >> >>> >>> >> access to such annotations while iterating over the >> >> Sentences of >> >> >> >>> the >> >> >> >>> >>> >> text. >> >> >> >>> >>> >> >> >> >> >>> >>> >> >> >> >> >>> >>> >> best >> >> >> >>> >>> >> Rupert >> >> >> >>> >>> >> >> >> >> >>> >>> >> [1] >> >> >> >>> >>> >> >> >> >> >>> >>> >> >> >> >>> >> >> >> >> >> >> https://svn.apache.org/repos/asf/stanbol/trunk/enhancement-engines/senti= ment-summarization/src/main/java/org/apache/stanbol/enhancer/engines/sentim= ent/summarize/SentimentSummarizationEngine.java >> >> >> >>> >>> >> [2] >> >> >> >>> >>> >> >> >> >> >>> >>> >> >> >> >>> >> >> >> >> >> >> https://svn.apache.org/repos/asf/stanbol/trunk/enhancement-engines/disam= biguation-mlt/src/main/java/org/apache/stanbol/enhancer/engine/disambiguati= on/mlt/DisambiguationData.java >> >> >> >>> >>> >> >> >> >> >>> >>> >> > >> >> >> >>> >>> >> > Thanks >> >> >> >>> >>> >> > >> >> >> >>> >>> >> > Hope this helps to bootstrap this discussion >> >> >> >>> >>> >> >> best >> >> >> >>> >>> >> >> Rupert >> >> >> >>> >>> >> >> >> >> >> >>> >>> >> >> -- >> >> >> >>> >>> >> >> | Rupert Westenthaler >> >> >> rupert.westenthaler@gmail.com >> >> >> >>> >>> >> >> | Bodenlehenstra=C3=9Fe 11 >> >> >> >>> ++43-699-11108907 >> >> >> >>> >>> >> >> | A-5500 Bischofshofen >> >> >> >>> >>> >> >> >> >> >> >>> >>> >> >> >> >> >>> >>> >> >> >> >> >>> >>> >> >> >> >> >>> >>> >> -- >> >> >> >>> >>> >> | Rupert Westenthaler >> >> rupert.westenthaler@gmail.com >> >> >> >>> >>> >> | Bodenlehenstra=C3=9Fe 11 >> >> >> >>> ++43-699-11108907 >> >> >> >>> >>> >> | A-5500 Bischofshofen >> >> >> >>> >>> >> >> >> >> >>> >>> >> >> >> >>> >>> >> >> >> >>> >>> >> >> >> >>> >>> -- >> >> >> >>> >>> | Rupert Westenthaler >> rupert.westenthaler@gmail.com >> >> >> >>> >>> | Bodenlehenstra=C3=9Fe 11 >> >> >> ++43-699-11108907 >> >> >> >>> >>> | A-5500 Bischofshofen >> >> >> >>> >>> >> >> >> >>> >> >> >> >> >>> >> >> >> >> >>> >> >> >> >>> >> >> >> >>> >> >> >> >>> -- >> >> >> >>> | Rupert Westenthaler rupert.westenthaler@gmail.co= m >> >> >> >>> | Bodenlehenstra=C3=9Fe 11 >> ++43-699-11108907 >> >> >> >>> | A-5500 Bischofshofen >> >> >> >>> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> -- >> >> >> | Rupert Westenthaler rupert.westenthaler@gmail.com >> >> >> | Bodenlehenstra=C3=9Fe 11 ++43-699-11= 108907 >> >> >> | A-5500 Bischofshofen >> >> >> >> >> >> >> >> >> >> >> -- >> >> | Rupert Westenthaler rupert.westenthaler@gmail.com >> >> | Bodenlehenstra=C3=9Fe 11 ++43-699-11108= 907 >> >> | A-5500 Bischofshofen >> >> >> >> >> >> -- >> | Rupert Westenthaler rupert.westenthaler@gmail.com >> | Bodenlehenstra=C3=9Fe 11 ++43-699-11108907 >> | A-5500 Bischofshofen >> > > --089e01538040e5d36e04e5562b65--