stanbol-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rupert Westenthaler <rupert.westentha...@gmail.com>
Subject Re: Relation extraction feature
Date Thu, 12 Sep 2013 16:17:27 GMT
Hi Cristian,

In fact I missed it. Sorry for that.

I think the revised proposal looks like a good start. Usually one
needs make some adaptions when writing the actual code.

If you have a first version attach it to an issue and I will commit it
to the branch.

best
Rupert


On Thu, Sep 12, 2013 at 9:04 AM, Cristian Petroaca
<cristian.petroaca@gmail.com> wrote:
> Hi Rupert,
>
> This is a reminder in case you missed this e-mail.
>
> Cristian
>
>
> 2013/9/3 Cristian Petroaca <cristian.petroaca@gmail.com>
>
>> Ok, then to sum it up we would have :
>>
>> 1. Coref
>>
>> "stanbol.enhancer.nlp.coref" {
>>     "isRepresentative" : true/false, // whether this token or chunk is the
>> representative mention in the chain
>>     "mentions" : [ { "type" : "Token", // type of element which refers to
>> this token/chunk
>>  "start": 123 , // start index of the mentioning element
>>  "end": 130 // end index of the mentioning element
>>                     }, ...
>>                  ],
>>     "class" : ""class" : "org.apache.stanbol.enhancer.nlp.coref.CorefTag"
>> }
>>
>>
>> 2. Dependency tree
>>
>> "stanbol.enhancer.nlp.dependency" : {
>> "relations" : [ { "tag" : "nsubj", //type of relation - Stanford NLP
>> notation
>>                        "dep" : 12, // type of relation - Stanbol NLP
>> mapped value - ordinal number in enum Dependency
>> "role" : "gov/dep", // whether this token is the depender or the dependee
>>  "type" : "Token", // type of element with which this token is in relation
>> "start" : 123, // start index of the relating token
>>  "end" : 130 // end index of the relating token
>> },
>> ...
>>  ]
>> "class" : "org.apache.stanbol.enhancer.nlp.dependency.DependencyTag"
>> }
>>
>>
>> 2013/9/2 Rupert Westenthaler <rupert.westenthaler@gmail.com>
>>
>>> Hi Cristian,
>>>
>>> let me provide some feedback to your proposals:
>>>
>>> ### Referring other Spans
>>>
>>> Both suggested annotations require to link other spans (Sentence,
>>> Chunk or Token). For that we should introduce a JSON element used for
>>> referring those elements and use it for all usages.
>>>
>>> In the java model this would allow you to have a reference to the
>>> other Span (Sentence, Chunk, Token). In the serialized form you would
>>> have JSON elements with the "type", "start" and "end" attributes as
>>> those three uniquely identify any span.
>>>
>>> Here an example based on the "mention" attribute as defined by the
>>> proposed "org.apache.stanbol.enhancer.nlp.coref.CorefTag"
>>>
>>>     ...
>>>     "mentions" : [ {
>>>         "type" : "Token",
>>>         "start": 123 ,
>>>         "end": 130 } ,{
>>>         "type" : "Token",
>>>         "start": 157 ,
>>>         "end": 165 }],
>>>     ...
>>>
>>> Similar token links in
>>> "org.apache.stanbol.enhancer.nlp.dependency.DependencyTag" should also
>>> use this model.
>>>
>>> ### Usage of Controlled Vocabularies
>>>
>>> In addition the DependencyTag also seams to use a controlled
>>> vocabulary (e.g. 'nsubj', 'conj_and' ...). In such cases the Stanbol
>>> NLP module tries to define those in some kind of Ontology. For POS
>>> tags we use OLIA ontology [1]. This is important as most NLP
>>> frameworks will use different strings and we need to unify those to
>>> commons IDs so that component that consume those data do not depend on
>>> a specific NLP tool.
>>>
>>> Because the usage of Ontologies within Java is not well supported. The
>>> Stanbol NLP module defines Java Enumerations for those Ontologies such
>>> as the POS type enumeration [2].
>>>
>>> Both the Java Model as well as the JSON serialization do support both
>>> (1) the lexical tag as used by the NLP tool and (2) the mapped
>>> concept. In the Java API via two different methods and in the JSON
>>> serialization via two separate keys.
>>>
>>> To make this more clear here an example for a POS annotation of a proper
>>> noun.
>>>
>>>     "stanbol.enhancer.nlp.pos" : {
>>>         "tag" : "PN",
>>>         "pos" : 53,
>>>         "class" : "org.apache.stanbol.enhancer.nlp.pos.PosTag",
>>>         "prob" : 0.95
>>>     }
>>>
>>> where
>>>
>>>     "tag" : "PN"
>>>
>>> is the lexical form as used by the NLP tool and
>>>
>>>     "pos" : 53
>>>
>>> refers to the ordinal number of the entry "ProperNoun" in the POS
>>> enumeration
>>>
>>> IMO the "type" property of DependencyTag should use a similar design.
>>>
>>> best
>>> Rupert
>>>
>>> [1] http://olia.nlp2rdf.org/
>>> [2]
>>> http://svn.apache.org/repos/asf/stanbol/trunk/enhancer/generic/nlp/src/main/java/org/apache/stanbol/enhancer/nlp/pos/Pos.java
>>>
>>> On Sun, Sep 1, 2013 at 8:09 PM, Cristian Petroaca
>>> <cristian.petroaca@gmail.com> wrote:
>>> > Sorry, pressed sent too soon :).
>>> >
>>> > Continued :
>>> >
>>> > nsubj(met-4, Mary-1), conj_and(Mary-1, Tom-3), nsubj(met-4, Tom-3),
>>> > root(ROOT-0, met-4), nn(today-6, Danny-5), tmod(met-4, today-6)]
>>> >
>>> > Given this, we can have for each "Token" an additional dependency
>>> > annotation :
>>> >
>>> > "stanbol.enhancer.nlp.dependency" : {
>>> > "tag" : //is it necessary?
>>> > "relations" : [ { "type" : "nsubj", //type of relation
>>> >   "role" : "gov/dep", //whether it is depender or the dependee
>>> >   "dependencyValue" : "met", // the word with which the token has a
>>> relation
>>> >   "dependencyIndexInSentence" : "2" //the index of the dependency in the
>>> > current sentence
>>> > }
>>> > ...
>>> > ]
>>> >                 "class" :
>>> > "org.apache.stanbol.enhancer.nlp.dependency.DependencyTag"
>>> >         }
>>> >
>>> > 2013/9/1 Cristian Petroaca <cristian.petroaca@gmail.com>
>>> >
>>> >> Related to the Stanford Dependency Tree Feature, this is the way the
>>> >> output from the tool looks like for this sentence : "Mary and Tom met
>>> Danny
>>> >> today" :
>>> >>
>>> >>
>>> >> 2013/8/30 Cristian Petroaca <cristian.petroaca@gmail.com>
>>> >>
>>> >>> Hi Rupert,
>>> >>>
>>> >>> Ok, so after looking at the JSON output from the Stanford NLP Server
>>> and
>>> >>> the coref module I'm thinking I can represent the coreference
>>> information
>>> >>> this way:
>>> >>> Each "Token" or "Chunk" will contain an additional coref annotation
>>> with
>>> >>> the following structure :
>>> >>>
>>> >>> "stanbol.enhancer.nlp.coref" {
>>> >>>     "tag" : //does this need to exist?
>>> >>>     "isRepresentative" : true/false, // whether this token or chunk is
>>> >>> the representative mention in the chain
>>> >>>     "mentions" : [ { "sentenceNo" : 1 //the sentence in which the
>>> mention
>>> >>> is found
>>> >>>                            "startWord" : 2 //the first word making up
>>> the
>>> >>> mention
>>> >>>                            "endWord" : 3 //the last word making up the
>>> >>> mention
>>> >>>                          }, ...
>>> >>>                        ],
>>> >>>     "class" : ""class" :
>>> "org.apache.stanbol.enhancer.nlp.coref.CorefTag"
>>> >>> }
>>> >>>
>>> >>> The CorefTag should resemble this model.
>>> >>>
>>> >>> What do you think?
>>> >>>
>>> >>> Cristian
>>> >>>
>>> >>>
>>> >>> 2013/8/24 Rupert Westenthaler <rupert.westenthaler@gmail.com>
>>> >>>
>>> >>>> Hi Cristian,
>>> >>>>
>>> >>>> you can not directly call StanfordNLP components from Stanbol, but
>>> you
>>> >>>> have to extend the RESTful service to include the information you
>>> >>>> need. The main reason for that is that the license of StanfordNLP is
>>> >>>> not compatible with the Apache Software License. So Stanbol can not
>>> >>>> directly link to the StanfordNLP API.
>>> >>>>
>>> >>>> You will need to
>>> >>>>
>>> >>>> 1. define an additional class {yourTag} extends Tag<{yourType}> class
>>> >>>> in the o.a.s.enhancer.nlp module
>>> >>>> 2. add JSON parsing and serialization support for this tag to the
>>> >>>> o.a.s.enhancer.nlp.json module (see e.g. PosTagSupport as an example)
>>> >>>>
>>> >>>> As (1) would be necessary anyway the only additional thing you need
>>> to
>>> >>>> develop is (2). After that you can add {yourTag} instance to the
>>> >>>> AnalyzedText in the StanfornNLP integration. The
>>> >>>> RestfulNlpAnalysisEngine will parse them from the response. All
>>> >>>> engines executed after the RestfulNlpAnalysisEngine will have access
>>> >>>> to your annotations.
>>> >>>>
>>> >>>> If you have a design for {yourTag} - the model you would like to use
>>> >>>> to represent your data - I can help with (1) and (2).
>>> >>>>
>>> >>>> best
>>> >>>> Rupert
>>> >>>>
>>> >>>>
>>> >>>> On Fri, Aug 23, 2013 at 5:11 PM, Cristian Petroaca
>>> >>>> <cristian.petroaca@gmail.com> wrote:
>>> >>>> > Hi Rupert,
>>> >>>> >
>>> >>>> > Thanks for the info. Looking at the standbol-stanfordnlp project I
>>> see
>>> >>>> that
>>> >>>> > the stanford nlp is not implemented as an EnhancementEngine but
>>> rather
>>> >>>> it
>>> >>>> > is used directly in a Jetty Server instance. How does that fit
>>> into the
>>> >>>> > Stanbol stack? For example how can I call the StanfordNlpAnalyzer's
>>> >>>> routine
>>> >>>> > from my TripleExtractionEnhancementEngine which lives in the
>>> Stanbol
>>> >>>> stack?
>>> >>>> >
>>> >>>> > Thanks,
>>> >>>> > Cristian
>>> >>>> >
>>> >>>> >
>>> >>>> > 2013/8/12 Rupert Westenthaler <rupert.westenthaler@gmail.com>
>>> >>>> >
>>> >>>> >> Hi Cristian,
>>> >>>> >>
>>> >>>> >> Sorry for the late response, but I was offline for the last two
>>> weeks
>>> >>>> >>
>>> >>>> >> On Fri, Aug 2, 2013 at 9:19 PM, Cristian Petroaca
>>> >>>> >> <cristian.petroaca@gmail.com> wrote:
>>> >>>> >> > Hi Rupert,
>>> >>>> >> >
>>> >>>> >> > After doing some tests it seems that the Stanford NLP
>>> coreference
>>> >>>> module
>>> >>>> >> is
>>> >>>> >> > much more accurate than the Open NLP one.So I decided to extend
>>> >>>> Stanford
>>> >>>> >> > NLP to add coreference there.
>>> >>>> >>
>>> >>>> >> The Stanford NLP integration is not part of the Stanbol codebase
>>> >>>> >> because the licenses are not compatible.
>>> >>>> >>
>>> >>>> >> You can find the Stanford NLP integration on
>>> >>>> >>
>>> >>>> >>     https://github.com/westei/stanbol-stanfordnlp
>>> >>>> >>
>>> >>>> >> just create a fork and send pull requests.
>>> >>>> >>
>>> >>>> >>
>>> >>>> >> > Could you add the necessary projects on the branch? And also
>>> remove
>>> >>>> the
>>> >>>> >> > Open NLP ones?
>>> >>>> >> >
>>> >>>> >>
>>> >>>> >> Currently the branch
>>> >>>> >>
>>> >>>> >>
>>> >>>> >>
>>> >>>>
>>> http://svn.apache.org/repos/asf/stanbol/branches/nlp-dep-tree-and-co-ref/
>>> >>>> >>
>>> >>>> >> only contains the "nlp" and the "nlp-json" modules. IMO those
>>> should
>>> >>>> >> be enough for adding coreference support.
>>> >>>> >>
>>> >>>> >> IMO you will need to
>>> >>>> >>
>>> >>>> >> * add an model for representing coreference to the nlp module
>>> >>>> >> * add parsing and serializing support to the nlp-json module
>>> >>>> >> * add the implementation to your fork of the stanbol-stanfordnlp
>>> >>>> project
>>> >>>> >>
>>> >>>> >> best
>>> >>>> >> Rupert
>>> >>>> >>
>>> >>>> >>
>>> >>>> >>
>>> >>>> >> > Thanks,
>>> >>>> >> > Cristian
>>> >>>> >> >
>>> >>>> >> >
>>> >>>> >> > 2013/7/5 Rupert Westenthaler <rupert.westenthaler@gmail.com>
>>> >>>> >> >
>>> >>>> >> >> Hi Cristian,
>>> >>>> >> >>
>>> >>>> >> >> I created the branch at
>>> >>>> >> >>
>>> >>>> >> >>
>>> >>>> >> >>
>>> >>>> >>
>>> >>>>
>>> http://svn.apache.org/repos/asf/stanbol/branches/nlp-dep-tree-and-co-ref/
>>> >>>> >> >>
>>> >>>> >> >> ATM in contains only the "nlp" and "nlp-json" module. Let me
>>> know
>>> >>>> if
>>> >>>> >> >> you would like to have more
>>> >>>> >> >>
>>> >>>> >> >> best
>>> >>>> >> >> Rupert
>>> >>>> >> >>
>>> >>>> >> >>
>>> >>>> >> >>
>>> >>>> >> >> On Thu, Jul 4, 2013 at 10:14 AM, Cristian Petroaca
>>> >>>> >> >> <cristian.petroaca@gmail.com> wrote:
>>> >>>> >> >> > Hi Rupert,
>>> >>>> >> >> >
>>> >>>> >> >> > I created jiras :
>>> >>>> https://issues.apache.org/jira/browse/STANBOL-1132and
>>> >>>> >> >> > https://issues.apache.org/jira/browse/STANBOL-1133. The
>>> >>>> original one
>>> >>>> >> in
>>> >>>> >> >> > dependent upon these.
>>> >>>> >> >> > Please let me know when I can start using the branch.
>>> >>>> >> >> >
>>> >>>> >> >> > Thanks,
>>> >>>> >> >> > Cristian
>>> >>>> >> >> >
>>> >>>> >> >> >
>>> >>>> >> >> > 2013/6/27 Cristian Petroaca <cristian.petroaca@gmail.com>
>>> >>>> >> >> >
>>> >>>> >> >> >>
>>> >>>> >> >> >>
>>> >>>> >> >> >>
>>> >>>> >> >> >> 2013/6/27 Rupert Westenthaler <
>>> rupert.westenthaler@gmail.com>
>>> >>>> >> >> >>
>>> >>>> >> >> >>> On Thu, Jun 27, 2013 at 3:12 PM, Cristian Petroaca
>>> >>>> >> >> >>> <cristian.petroaca@gmail.com> wrote:
>>> >>>> >> >> >>> > Sorry, I meant the Stanbol NLP API, not Stanford in my
>>> >>>> previous
>>> >>>> >> >> e-mail.
>>> >>>> >> >> >>> By
>>> >>>> >> >> >>> > the way, does Open NLP have the ability to build
>>> dependency
>>> >>>> trees?
>>> >>>> >> >> >>> >
>>> >>>> >> >> >>>
>>> >>>> >> >> >>> AFAIK OpenNLP does not provide this feature.
>>> >>>> >> >> >>>
>>> >>>> >> >> >>
>>> >>>> >> >> >> Then , since the Stanford NLP lib is also integrated into
>>> >>>> Stanbol,
>>> >>>> >> I'll
>>> >>>> >> >> >> take a look at how I can extend its integration to include
>>> the
>>> >>>> >> >> dependency
>>> >>>> >> >> >> tree feature.
>>> >>>> >> >> >>
>>> >>>> >> >> >>>
>>> >>>> >> >> >>>
>>> >>>> >> >> >>  >
>>> >>>> >> >> >>> > 2013/6/23 Cristian Petroaca <cristian.petroaca@gmail.com
>>> >
>>> >>>> >> >> >>> >
>>> >>>> >> >> >>> >> Hi Rupert,
>>> >>>> >> >> >>> >>
>>> >>>> >> >> >>> >> I created jira
>>> >>>> >> https://issues.apache.org/jira/browse/STANBOL-1121.
>>> >>>> >> >> >>> >> As you suggested I would start with extending the
>>> Stanford
>>> >>>> NLP
>>> >>>> >> with
>>> >>>> >> >> >>> >> co-reference resolution but I think also with dependency
>>> >>>> trees
>>> >>>> >> >> because
>>> >>>> >> >> >>> I
>>> >>>> >> >> >>> >> also need to know the Subject of the sentence and the
>>> object
>>> >>>> >> that it
>>> >>>> >> >> >>> >> affects, right?
>>> >>>> >> >> >>> >>
>>> >>>> >> >> >>> >> Given that I need to extend the Stanford NLP API in
>>> Stanbol
>>> >>>> for
>>> >>>> >> >> >>> >> co-reference and dependency trees, how do I proceed with
>>> >>>> this?
>>> >>>> >> Do I
>>> >>>> >> >> >>> create
>>> >>>> >> >> >>> >> 2 new sub-tasks to the already opened Jira? After that
>>> can I
>>> >>>> >> start
>>> >>>> >> >> >>> >> implementing on my local copy of Stanbol and when I'm
>>> done
>>> >>>> I'll
>>> >>>> >> send
>>> >>>> >> >> >>> you
>>> >>>> >> >> >>> >> guys the patch fo review?
>>> >>>> >> >> >>> >>
>>> >>>> >> >> >>>
>>> >>>> >> >> >>> I would create two "New Feature" type Issues one for adding
>>> >>>> support
>>> >>>> >> >> >>> for "dependency trees" and the other for "co-reference"
>>> >>>> support. You
>>> >>>> >> >> >>> should also define "depends on" relations between
>>> STANBOL-1121
>>> >>>> and
>>> >>>> >> >> >>> those two new issues.
>>> >>>> >> >> >>>
>>> >>>> >> >> >>> Sub-task could also work, but as adding those features
>>> would
>>> >>>> be also
>>> >>>> >> >> >>> interesting for other things I would rather define them as
>>> >>>> separate
>>> >>>> >> >> >>> issues.
>>> >>>> >> >> >>>
>>> >>>> >> >> >>>
>>> >>>> >> >> >> 2 New Features connected with the original jira it is then.
>>> >>>> >> >> >>
>>> >>>> >> >> >>
>>> >>>> >> >> >>> If you would prefer to work in an own branch please tell
>>> me.
>>> >>>> This
>>> >>>> >> >> >>> could have the advantage that patches would not be
>>> affected by
>>> >>>> >> changes
>>> >>>> >> >> >>> in the trunk.
>>> >>>> >> >> >>>
>>> >>>> >> >> >>> Yes, a separate branch sounds good.
>>> >>>> >> >> >>
>>> >>>> >> >> >> best
>>> >>>> >> >> >>> Rupert
>>> >>>> >> >> >>>
>>> >>>> >> >> >>> >> Regards,
>>> >>>> >> >> >>> >> Cristian
>>> >>>> >> >> >>> >>
>>> >>>> >> >> >>> >>
>>> >>>> >> >> >>> >> 2013/6/18 Rupert Westenthaler <
>>> >>>> rupert.westenthaler@gmail.com>
>>> >>>> >> >> >>> >>
>>> >>>> >> >> >>> >>> On Mon, Jun 17, 2013 at 10:18 PM, Cristian Petroaca
>>> >>>> >> >> >>> >>> <cristian.petroaca@gmail.com> wrote:
>>> >>>> >> >> >>> >>> > Hi Rupert,
>>> >>>> >> >> >>> >>> >
>>> >>>> >> >> >>> >>> > Agreed on the
>>> >>>> >> >> >>> SettingAnnotation/ParticipantAnnotation/OccurentAnnotation
>>> >>>> >> >> >>> >>> > data structure.
>>> >>>> >> >> >>> >>> >
>>> >>>> >> >> >>> >>> > Should I open up a Jira for all of this in order to
>>> >>>> >> encapsulate
>>> >>>> >> >> this
>>> >>>> >> >> >>> >>> > information and establish the goals and these initial
>>> >>>> steps
>>> >>>> >> >> towards
>>> >>>> >> >> >>> >>> these
>>> >>>> >> >> >>> >>> > goals?
>>> >>>> >> >> >>> >>>
>>> >>>> >> >> >>> >>> Yes please. A JIRA issue for this work would be great.
>>> >>>> >> >> >>> >>>
>>> >>>> >> >> >>> >>> > How should I proceed further? Should I create some
>>> design
>>> >>>> >> >> documents
>>> >>>> >> >> >>> that
>>> >>>> >> >> >>> >>> > need to be reviewed?
>>> >>>> >> >> >>> >>>
>>> >>>> >> >> >>> >>> Usually it is the best to write design related text
>>> >>>> directly in
>>> >>>> >> >> JIRA
>>> >>>> >> >> >>> >>> by using Markdown [1] syntax. This will allow us later
>>> to
>>> >>>> use
>>> >>>> >> this
>>> >>>> >> >> >>> >>> text directly for the documentation on the Stanbol
>>> Webpage.
>>> >>>> >> >> >>> >>>
>>> >>>> >> >> >>> >>> best
>>> >>>> >> >> >>> >>> Rupert
>>> >>>> >> >> >>> >>>
>>> >>>> >> >> >>> >>>
>>> >>>> >> >> >>> >>> [1] http://daringfireball.net/projects/markdown/
>>> >>>> >> >> >>> >>> >
>>> >>>> >> >> >>> >>> > Regards,
>>> >>>> >> >> >>> >>> > Cristian
>>> >>>> >> >> >>> >>> >
>>> >>>> >> >> >>> >>> >
>>> >>>> >> >> >>> >>> > 2013/6/17 Rupert Westenthaler <
>>> >>>> rupert.westenthaler@gmail.com>
>>> >>>> >> >> >>> >>> >
>>> >>>> >> >> >>> >>> >> On Thu, Jun 13, 2013 at 8:22 PM, Cristian Petroaca
>>> >>>> >> >> >>> >>> >> <cristian.petroaca@gmail.com> wrote:
>>> >>>> >> >> >>> >>> >> > HI Rupert,
>>> >>>> >> >> >>> >>> >> >
>>> >>>> >> >> >>> >>> >> > First of all thanks for the detailed suggestions.
>>> >>>> >> >> >>> >>> >> >
>>> >>>> >> >> >>> >>> >> > 2013/6/12 Rupert Westenthaler <
>>> >>>> >> rupert.westenthaler@gmail.com>
>>> >>>> >> >> >>> >>> >> >
>>> >>>> >> >> >>> >>> >> >> Hi Cristian, all
>>> >>>> >> >> >>> >>> >> >>
>>> >>>> >> >> >>> >>> >> >> really interesting use case!
>>> >>>> >> >> >>> >>> >> >>
>>> >>>> >> >> >>> >>> >> >> In this mail I will try to give some suggestions
>>> on
>>> >>>> how
>>> >>>> >> this
>>> >>>> >> >> >>> could
>>> >>>> >> >> >>> >>> >> >> work out. This suggestions are mainly based on
>>> >>>> experiences
>>> >>>> >> >> and
>>> >>>> >> >> >>> >>> lessons
>>> >>>> >> >> >>> >>> >> >> learned in the LIVE [2] project where we built an
>>> >>>> >> information
>>> >>>> >> >> >>> system
>>> >>>> >> >> >>> >>> >> >> for the Olympic Games in Peking. While this
>>> Project
>>> >>>> >> excluded
>>> >>>> >> >> the
>>> >>>> >> >> >>> >>> >> >> extraction of Events from unstructured text
>>> (because
>>> >>>> the
>>> >>>> >> >> Olympic
>>> >>>> >> >> >>> >>> >> >> Information System was already providing event
>>> data
>>> >>>> as XML
>>> >>>> >> >> >>> messages)
>>> >>>> >> >> >>> >>> >> >> the semantic search capabilities of this system
>>> >>>> where very
>>> >>>> >> >> >>> similar
>>> >>>> >> >> >>> >>> as
>>> >>>> >> >> >>> >>> >> >> the one described by your use case.
>>> >>>> >> >> >>> >>> >> >>
>>> >>>> >> >> >>> >>> >> >> IMHO you are not only trying to extract
>>> relations,
>>> >>>> but a
>>> >>>> >> >> formal
>>> >>>> >> >> >>> >>> >> >> representation of the situation described by the
>>> >>>> text. So
>>> >>>> >> >> lets
>>> >>>> >> >> >>> >>> assume
>>> >>>> >> >> >>> >>> >> >> that the goal is to Annotate a Setting (or
>>> Situation)
>>> >>>> >> >> described
>>> >>>> >> >> >>> in
>>> >>>> >> >> >>> >>> the
>>> >>>> >> >> >>> >>> >> >> text - a fise:SettingAnnotation.
>>> >>>> >> >> >>> >>> >> >>
>>> >>>> >> >> >>> >>> >> >> The DOLCE foundational ontology [1] gives some
>>> >>>> advices on
>>> >>>> >> >> how to
>>> >>>> >> >> >>> >>> model
>>> >>>> >> >> >>> >>> >> >> those. The important relation for modeling this
>>> >>>> >> >> Participation:
>>> >>>> >> >> >>> >>> >> >>
>>> >>>> >> >> >>> >>> >> >>     PC(x, y, t) → (ED(x) ∧ PD(y) ∧ T(t))
>>> >>>> >> >> >>> >>> >> >>
>>> >>>> >> >> >>> >>> >> >> where ..
>>> >>>> >> >> >>> >>> >> >>
>>> >>>> >> >> >>> >>> >> >>  * ED are Endurants (continuants): Endurants do
>>> have
>>> >>>> an
>>> >>>> >> >> >>> identity so
>>> >>>> >> >> >>> >>> we
>>> >>>> >> >> >>> >>> >> >> would typically refer to them as Entities
>>> referenced
>>> >>>> by a
>>> >>>> >> >> >>> setting.
>>> >>>> >> >> >>> >>> >> >> Note that this includes physical, non-physical as
>>> >>>> well as
>>> >>>> >> >> >>> >>> >> >> social-objects.
>>> >>>> >> >> >>> >>> >> >>  * PD are Perdurants (occurrents):  Perdurants
>>> are
>>> >>>> >> entities
>>> >>>> >> >> that
>>> >>>> >> >> >>> >>> >> >> happen in time. This refers to Events,
>>> Activities ...
>>> >>>> >> >> >>> >>> >> >>  * PC are Participation: It is an time indexed
>>> >>>> relation
>>> >>>> >> where
>>> >>>> >> >> >>> >>> >> >> Endurants participate in Perdurants
>>> >>>> >> >> >>> >>> >> >>
>>> >>>> >> >> >>> >>> >> >> Modeling this in RDF requires to define some
>>> >>>> intermediate
>>> >>>> >> >> >>> resources
>>> >>>> >> >> >>> >>> >> >> because RDF does not allow for n-ary relations.
>>> >>>> >> >> >>> >>> >> >>
>>> >>>> >> >> >>> >>> >> >>  * fise:SettingAnnotation: It is really handy to
>>> >>>> define
>>> >>>> >> one
>>> >>>> >> >> >>> resource
>>> >>>> >> >> >>> >>> >> >> being the context for all described data. I would
>>> >>>> call
>>> >>>> >> this
>>> >>>> >> >> >>> >>> >> >> "fise:SettingAnnotation" and define it as a
>>> >>>> sub-concept to
>>> >>>> >> >> >>> >>> >> >> fise:Enhancement. All further enhancement about
>>> the
>>> >>>> >> extracted
>>> >>>> >> >> >>> >>> Setting
>>> >>>> >> >> >>> >>> >> >> would define a "fise:in-setting" relation to it.
>>> >>>> >> >> >>> >>> >> >>
>>> >>>> >> >> >>> >>> >> >>  * fise:ParticipantAnnotation: Is used to
>>> annotate
>>> >>>> that
>>> >>>> >> >> >>> Endurant is
>>> >>>> >> >> >>> >>> >> >> participating on a setting (fise:in-setting
>>> >>>> >> >> >>> fise:SettingAnnotation).
>>> >>>> >> >> >>> >>> >> >> The Endurant itself is described by existing
>>> >>>> >> >> fise:TextAnnotaion
>>> >>>> >> >> >>> (the
>>> >>>> >> >> >>> >>> >> >> mentions) and fise:EntityAnnotation (suggested
>>> >>>> Entities).
>>> >>>> >> >> >>> Basically
>>> >>>> >> >> >>> >>> >> >> the fise:ParticipantAnnotation will allow an
>>> >>>> >> >> EnhancementEngine
>>> >>>> >> >> >>> to
>>> >>>> >> >> >>> >>> >> >> state that several mentions (in possible
>>> different
>>> >>>> >> >> sentences) do
>>> >>>> >> >> >>> >>> >> >> represent the same Endurant as participating in
>>> the
>>> >>>> >> Setting.
>>> >>>> >> >> In
>>> >>>> >> >> >>> >>> >> >> addition it would be possible to use the dc:type
>>> >>>> property
>>> >>>> >> >> >>> (similar
>>> >>>> >> >> >>> >>> as
>>> >>>> >> >> >>> >>> >> >> for fise:TextAnnotation) to refer to the role(s)
>>> of
>>> >>>> an
>>> >>>> >> >> >>> participant
>>> >>>> >> >> >>> >>> >> >> (e.g. the set: Agent (intensionally performs an
>>> >>>> action)
>>> >>>> >> Cause
>>> >>>> >> >> >>> >>> >> >> (unintentionally e.g. a mud slide), Patient (a
>>> >>>> passive
>>> >>>> >> role
>>> >>>> >> >> in
>>> >>>> >> >> >>> an
>>> >>>> >> >> >>> >>> >> >> activity) and Instrument (aids an process)), but
>>> I am
>>> >>>> >> >> wondering
>>> >>>> >> >> >>> if
>>> >>>> >> >> >>> >>> one
>>> >>>> >> >> >>> >>> >> >> could extract those information.
>>> >>>> >> >> >>> >>> >> >>
>>> >>>> >> >> >>> >>> >> >> * fise:OccurrentAnnotation: is used to annotate a
>>> >>>> >> Perdurant
>>> >>>> >> >> in
>>> >>>> >> >> >>> the
>>> >>>> >> >> >>> >>> >> >> context of the Setting. Also
>>> >>>> fise:OccurrentAnnotation can
>>> >>>> >> >> link
>>> >>>> >> >> >>> to
>>> >>>> >> >> >>> >>> >> >> fise:TextAnnotaion (typically verbs in the text
>>> >>>> defining
>>> >>>> >> the
>>> >>>> >> >> >>> >>> >> >> perdurant) as well as fise:EntityAnnotation
>>> >>>> suggesting
>>> >>>> >> well
>>> >>>> >> >> >>> known
>>> >>>> >> >> >>> >>> >> >> Events in a knowledge base (e.g. a Election in a
>>> >>>> country,
>>> >>>> >> or
>>> >>>> >> >> an
>>> >>>> >> >> >>> >>> >> >> upraising ...). In addition
>>> fise:OccurrentAnnotation
>>> >>>> can
>>> >>>> >> >> define
>>> >>>> >> >> >>> >>> >> >> dc:has-participant links to
>>> >>>> fise:ParticipantAnnotation. In
>>> >>>> >> >> this
>>> >>>> >> >> >>> case
>>> >>>> >> >> >>> >>> >> >> it is explicitly stated hat an Endurant (the
>>> >>>> >> >> >>> >>> >> >> fise:ParticipantAnnotation) involved in this
>>> >>>> Perturant
>>> >>>> >> (the
>>> >>>> >> >> >>> >>> >> >> fise:OccurrentAnnotation). As Occurrences are
>>> >>>> temporal
>>> >>>> >> >> indexed
>>> >>>> >> >> >>> this
>>> >>>> >> >> >>> >>> >> >> annotation should also support properties for
>>> >>>> defining the
>>> >>>> >> >> >>> >>> >> >> xsd:dateTime for the start/end.
>>> >>>> >> >> >>> >>> >> >>
>>> >>>> >> >> >>> >>> >> >>
>>> >>>> >> >> >>> >>> >> >> Indeed, an event based data structure makes a
>>> lot of
>>> >>>> sense
>>> >>>> >> >> with
>>> >>>> >> >> >>> the
>>> >>>> >> >> >>> >>> >> remark
>>> >>>> >> >> >>> >>> >> > that you probably won't be able to always extract
>>> the
>>> >>>> date
>>> >>>> >> >> for a
>>> >>>> >> >> >>> >>> given
>>> >>>> >> >> >>> >>> >> > setting(situation).
>>> >>>> >> >> >>> >>> >> > There are 2 thing which are unclear though.
>>> >>>> >> >> >>> >>> >> >
>>> >>>> >> >> >>> >>> >> > 1. Perdurant : You could have situations in which
>>> the
>>> >>>> >> object
>>> >>>> >> >> upon
>>> >>>> >> >> >>> >>> which
>>> >>>> >> >> >>> >>> >> the
>>> >>>> >> >> >>> >>> >> > Subject ( or Endurant ) is acting is not a
>>> transitory
>>> >>>> >> object (
>>> >>>> >> >> >>> such
>>> >>>> >> >> >>> >>> as an
>>> >>>> >> >> >>> >>> >> > event, activity ) but rather another Endurant. For
>>> >>>> example
>>> >>>> >> we
>>> >>>> >> >> can
>>> >>>> >> >> >>> >>> have
>>> >>>> >> >> >>> >>> >> the
>>> >>>> >> >> >>> >>> >> > phrase "USA invades Irak" where "USA" is the
>>> Endurant
>>> >>>> (
>>> >>>> >> >> Subject )
>>> >>>> >> >> >>> >>> which
>>> >>>> >> >> >>> >>> >> > performs the action of "invading" on another
>>> >>>> Eundurant,
>>> >>>> >> namely
>>> >>>> >> >> >>> >>> "Irak".
>>> >>>> >> >> >>> >>> >> >
>>> >>>> >> >> >>> >>> >>
>>> >>>> >> >> >>> >>> >> By using CAOS, USA would be the Agent and Iraq the
>>> >>>> Patient.
>>> >>>> >> Both
>>> >>>> >> >> >>> are
>>> >>>> >> >> >>> >>> >> Endurants. The activity "invading" would be the
>>> >>>> Perdurant. So
>>> >>>> >> >> >>> ideally
>>> >>>> >> >> >>> >>> >> you would have a  "fise:SettingAnnotation" with:
>>> >>>> >> >> >>> >>> >>
>>> >>>> >> >> >>> >>> >>   * fise:ParticipantAnnotation for USA with the
>>> dc:type
>>> >>>> >> >> caos:Agent,
>>> >>>> >> >> >>> >>> >> linking to a fise:TextAnnotation for "USA" and a
>>> >>>> >> >> >>> fise:EntityAnnotation
>>> >>>> >> >> >>> >>> >> linking to dbpedia:United_States
>>> >>>> >> >> >>> >>> >>   * fise:ParticipantAnnotation for Iraq with the
>>> dc:type
>>> >>>> >> >> >>> caos:Patient,
>>> >>>> >> >> >>> >>> >> linking to a fise:TextAnnotation for "Irak" and a
>>> >>>> >> >> >>> >>> >> fise:EntityAnnotation linking to  dbpedia:Iraq
>>> >>>> >> >> >>> >>> >>   * fise:OccurrentAnnotation for "invades" with the
>>> >>>> dc:type
>>> >>>> >> >> >>> >>> >> caos:Activity, linking to a fise:TextAnnotation for
>>> >>>> "invades"
>>> >>>> >> >> >>> >>> >>
>>> >>>> >> >> >>> >>> >> > 2. Where does the verb, which links the Subject
>>> and
>>> >>>> the
>>> >>>> >> Object
>>> >>>> >> >> >>> come
>>> >>>> >> >> >>> >>> into
>>> >>>> >> >> >>> >>> >> > this? I imagined that the Endurant would have a
>>> >>>> >> dc:"property"
>>> >>>> >> >> >>> where
>>> >>>> >> >> >>> >>> the
>>> >>>> >> >> >>> >>> >> > property = verb which links to the Object in noun
>>> >>>> form. For
>>> >>>> >> >> >>> example
>>> >>>> >> >> >>> >>> take
>>> >>>> >> >> >>> >>> >> > again the sentence "USA invades Irak". You would
>>> have
>>> >>>> the
>>> >>>> >> >> "USA"
>>> >>>> >> >> >>> >>> Entity
>>> >>>> >> >> >>> >>> >> with
>>> >>>> >> >> >>> >>> >> > dc:invader which points to the Object "Irak". The
>>> >>>> Endurant
>>> >>>> >> >> would
>>> >>>> >> >> >>> >>> have as
>>> >>>> >> >> >>> >>> >> > many dc:"property" elements as there are verbs
>>> which
>>> >>>> link
>>> >>>> >> it
>>> >>>> >> >> to
>>> >>>> >> >> >>> an
>>> >>>> >> >> >>> >>> >> Object.
>>> >>>> >> >> >>> >>> >>
>>> >>>> >> >> >>> >>> >> As explained above you would have a
>>> >>>> fise:OccurrentAnnotation
>>> >>>> >> >> that
>>> >>>> >> >> >>> >>> >> represents the Perdurant. The information that the
>>> >>>> activity
>>> >>>> >> >> >>> mention in
>>> >>>> >> >> >>> >>> >> the text is "invades" would be by linking to a
>>> >>>> >> >> >>> fise:TextAnnotation. If
>>> >>>> >> >> >>> >>> >> you can also provide an Ontology for Tasks that
>>> defines
>>> >>>> >> >> >>> >>> >> "myTasks:invade" the fise:OccurrentAnnotation could
>>> >>>> also link
>>> >>>> >> >> to an
>>> >>>> >> >> >>> >>> >> fise:EntityAnnotation for this concept.
>>> >>>> >> >> >>> >>> >>
>>> >>>> >> >> >>> >>> >> best
>>> >>>> >> >> >>> >>> >> Rupert
>>> >>>> >> >> >>> >>> >>
>>> >>>> >> >> >>> >>> >> >
>>> >>>> >> >> >>> >>> >> > ### Consuming the data:
>>> >>>> >> >> >>> >>> >> >>
>>> >>>> >> >> >>> >>> >> >> I think this model should be sufficient for
>>> >>>> use-cases as
>>> >>>> >> >> >>> described
>>> >>>> >> >> >>> >>> by
>>> >>>> >> >> >>> >>> >> you.
>>> >>>> >> >> >>> >>> >> >>
>>> >>>> >> >> >>> >>> >> >> Users would be able to consume data on the
>>> setting
>>> >>>> level.
>>> >>>> >> >> This
>>> >>>> >> >> >>> can
>>> >>>> >> >> >>> >>> be
>>> >>>> >> >> >>> >>> >> >> done my simple retrieving all
>>> >>>> fise:ParticipantAnnotation
>>> >>>> >> as
>>> >>>> >> >> >>> well as
>>> >>>> >> >> >>> >>> >> >> fise:OccurrentAnnotation linked with a setting.
>>> BTW
>>> >>>> this
>>> >>>> >> was
>>> >>>> >> >> the
>>> >>>> >> >> >>> >>> >> >> approach used in LIVE [2] for semantic search. It
>>> >>>> allows
>>> >>>> >> >> >>> queries for
>>> >>>> >> >> >>> >>> >> >> Settings that involve specific Entities e.g. you
>>> >>>> could
>>> >>>> >> filter
>>> >>>> >> >> >>> for
>>> >>>> >> >> >>> >>> >> >> Settings that involve a {Person},
>>> >>>> activities:Arrested and
>>> >>>> >> a
>>> >>>> >> >> >>> specific
>>> >>>> >> >> >>> >>> >> >> {Upraising}. However note that with this approach
>>> >>>> you will
>>> >>>> >> >> get
>>> >>>> >> >> >>> >>> results
>>> >>>> >> >> >>> >>> >> >> for Setting where the {Person} participated and
>>> an
>>> >>>> other
>>> >>>> >> >> person
>>> >>>> >> >> >>> was
>>> >>>> >> >> >>> >>> >> >> arrested.
>>> >>>> >> >> >>> >>> >> >>
>>> >>>> >> >> >>> >>> >> >> An other possibility would be to process
>>> enhancement
>>> >>>> >> results
>>> >>>> >> >> on
>>> >>>> >> >> >>> the
>>> >>>> >> >> >>> >>> >> >> fise:OccurrentAnnotation. This would allow to a
>>> much
>>> >>>> >> higher
>>> >>>> >> >> >>> >>> >> >> granularity level (e.g. it would allow to
>>> correctly
>>> >>>> answer
>>> >>>> >> >> the
>>> >>>> >> >> >>> query
>>> >>>> >> >> >>> >>> >> >> used as an example above). But I am wondering if
>>> the
>>> >>>> >> quality
>>> >>>> >> >> of
>>> >>>> >> >> >>> the
>>> >>>> >> >> >>> >>> >> >> Setting extraction will be sufficient for this. I
>>> >>>> have
>>> >>>> >> also
>>> >>>> >> >> >>> doubts
>>> >>>> >> >> >>> >>> if
>>> >>>> >> >> >>> >>> >> >> this can be still realized by using semantic
>>> >>>> indexing to
>>> >>>> >> >> Apache
>>> >>>> >> >> >>> Solr
>>> >>>> >> >> >>> >>> >> >> or if it would be better/necessary to store
>>> results
>>> >>>> in a
>>> >>>> >> >> >>> TripleStore
>>> >>>> >> >> >>> >>> >> >> and using SPARQL for retrieval.
>>> >>>> >> >> >>> >>> >> >>
>>> >>>> >> >> >>> >>> >> >> The methodology and query language used by YAGO
>>> [3]
>>> >>>> is
>>> >>>> >> also
>>> >>>> >> >> very
>>> >>>> >> >> >>> >>> >> >> relevant for this (especially note chapter 7
>>> SPOTL(X)
>>> >>>> >> >> >>> >>> Representation).
>>> >>>> >> >> >>> >>> >> >>
>>> >>>> >> >> >>> >>> >> >> An other related Topic is the enrichment of
>>> Entities
>>> >>>> >> >> (especially
>>> >>>> >> >> >>> >>> >> >> Events) in knowledge bases based on Settings
>>> >>>> extracted
>>> >>>> >> form
>>> >>>> >> >> >>> >>> Documents.
>>> >>>> >> >> >>> >>> >> >> As per definition - in DOLCE - Perdurants are
>>> >>>> temporal
>>> >>>> >> >> indexed.
>>> >>>> >> >> >>> That
>>> >>>> >> >> >>> >>> >> >> means that at the time when added to a knowledge
>>> >>>> base they
>>> >>>> >> >> might
>>> >>>> >> >> >>> >>> still
>>> >>>> >> >> >>> >>> >> >> be in process. So the creation, enriching and
>>> >>>> refinement
>>> >>>> >> of
>>> >>>> >> >> such
>>> >>>> >> >> >>> >>> >> >> Entities in a the knowledge base seams to be
>>> >>>> critical for
>>> >>>> >> a
>>> >>>> >> >> >>> System
>>> >>>> >> >> >>> >>> >> >> like described in your use-case.
>>> >>>> >> >> >>> >>> >> >>
>>> >>>> >> >> >>> >>> >> >> On Tue, Jun 11, 2013 at 9:09 PM, Cristian
>>> Petroaca
>>> >>>> >> >> >>> >>> >> >> <cristian.petroaca@gmail.com> wrote:
>>> >>>> >> >> >>> >>> >> >> >
>>> >>>> >> >> >>> >>> >> >> > First of all I have to mention that I am new
>>> in the
>>> >>>> >> field
>>> >>>> >> >> of
>>> >>>> >> >> >>> >>> semantic
>>> >>>> >> >> >>> >>> >> >> > technologies, I've started to read about them
>>> in
>>> >>>> the
>>> >>>> >> last
>>> >>>> >> >> 4-5
>>> >>>> >> >> >>> >>> >> >> months.Having
>>> >>>> >> >> >>> >>> >> >> > said that I have a high level overview of what
>>> is
>>> >>>> a good
>>> >>>> >> >> >>> approach
>>> >>>> >> >> >>> >>> to
>>> >>>> >> >> >>> >>> >> >> solve
>>> >>>> >> >> >>> >>> >> >> > this problem. There are a number of papers on
>>> the
>>> >>>> >> internet
>>> >>>> >> >> >>> which
>>> >>>> >> >> >>> >>> >> describe
>>> >>>> >> >> >>> >>> >> >> > what steps need to be taken such as : named
>>> entity
>>> >>>> >> >> >>> recognition,
>>> >>>> >> >> >>> >>> >> >> > co-reference resolution, pos tagging and
>>> others.
>>> >>>> >> >> >>> >>> >> >>
>>> >>>> >> >> >>> >>> >> >> The Stanbol NLP processing module currently only
>>> >>>> supports
>>> >>>> >> >> >>> sentence
>>> >>>> >> >> >>> >>> >> >> detection, tokenization, POS tagging, Chunking,
>>> NER
>>> >>>> and
>>> >>>> >> >> lemma.
>>> >>>> >> >> >>> >>> support
>>> >>>> >> >> >>> >>> >> >> for co-reference resolution and dependency trees
>>> is
>>> >>>> >> currently
>>> >>>> >> >> >>> >>> missing.
>>> >>>> >> >> >>> >>> >> >>
>>> >>>> >> >> >>> >>> >> >> Stanford NLP is already integrated with Stanbol
>>> [4].
>>> >>>> At
>>> >>>> >> the
>>> >>>> >> >> >>> moment
>>> >>>> >> >> >>> >>> it
>>> >>>> >> >> >>> >>> >> >> only supports English, but I do already work to
>>> >>>> include
>>> >>>> >> the
>>> >>>> >> >> >>> other
>>> >>>> >> >> >>> >>> >> >> supported languages. Other NLP framework that is
>>> >>>> already
>>> >>>> >> >> >>> integrated
>>> >>>> >> >> >>> >>> >> >> with Stanbol are Freeling [5] and Talismane [6].
>>> But
>>> >>>> note
>>> >>>> >> >> that
>>> >>>> >> >> >>> for
>>> >>>> >> >> >>> >>> all
>>> >>>> >> >> >>> >>> >> >> those the integration excludes support for
>>> >>>> co-reference
>>> >>>> >> and
>>> >>>> >> >> >>> >>> dependency
>>> >>>> >> >> >>> >>> >> >> trees.
>>> >>>> >> >> >>> >>> >> >>
>>> >>>> >> >> >>> >>> >> >> Anyways I am confident that one can implement a
>>> first
>>> >>>> >> >> prototype
>>> >>>> >> >> >>> by
>>> >>>> >> >> >>> >>> >> >> only using Sentences and POS tags and - if
>>> available
>>> >>>> -
>>> >>>> >> Chunks
>>> >>>> >> >> >>> (e.g.
>>> >>>> >> >> >>> >>> >> >> Noun phrases).
>>> >>>> >> >> >>> >>> >> >>
>>> >>>> >> >> >>> >>> >> >>
>>> >>>> >> >> >>> >>> >> > I assume that in the Stanbol context, a feature
>>> like
>>> >>>> >> Relation
>>> >>>> >> >> >>> >>> extraction
>>> >>>> >> >> >>> >>> >> > would be implemented as an EnhancementEngine?
>>> >>>> >> >> >>> >>> >> > What kind of effort would be required for a
>>> >>>> co-reference
>>> >>>> >> >> >>> resolution
>>> >>>> >> >> >>> >>> tool
>>> >>>> >> >> >>> >>> >> > integration into Stanbol?
>>> >>>> >> >> >>> >>> >> >
>>> >>>> >> >> >>> >>> >>
>>> >>>> >> >> >>> >>> >> Yes in the end it would be an EnhancementEngine. But
>>> >>>> before
>>> >>>> >> we
>>> >>>> >> >> can
>>> >>>> >> >> >>> >>> >> build such an engine we would need to
>>> >>>> >> >> >>> >>> >>
>>> >>>> >> >> >>> >>> >> * extend the Stanbol NLP processing API with
>>> >>>> Annotations for
>>> >>>> >> >> >>> >>> co-reference
>>> >>>> >> >> >>> >>> >> * add support for JSON Serialisation/Parsing for
>>> those
>>> >>>> >> >> annotation
>>> >>>> >> >> >>> so
>>> >>>> >> >> >>> >>> >> that the RESTful NLP Analysis Service can provide
>>> >>>> >> co-reference
>>> >>>> >> >> >>> >>> >> information
>>> >>>> >> >> >>> >>> >>
>>> >>>> >> >> >>> >>> >> > At this moment I'll be focusing on 2 aspects:
>>> >>>> >> >> >>> >>> >> >
>>> >>>> >> >> >>> >>> >> > 1. Determine the best data structure to
>>> encapsulate
>>> >>>> the
>>> >>>> >> >> extracted
>>> >>>> >> >> >>> >>> >> > information. I'll take a closer look at Dolce.
>>> >>>> >> >> >>> >>> >>
>>> >>>> >> >> >>> >>> >> Don't make to to complex. Defining a proper
>>> structure to
>>> >>>> >> >> represent
>>> >>>> >> >> >>> >>> >> Events will only pay-off if we can also successfully
>>> >>>> extract
>>> >>>> >> >> such
>>> >>>> >> >> >>> >>> >> information form processed texts.
>>> >>>> >> >> >>> >>> >>
>>> >>>> >> >> >>> >>> >> I would start with
>>> >>>> >> >> >>> >>> >>
>>> >>>> >> >> >>> >>> >>  * fise:SettingAnnotation
>>> >>>> >> >> >>> >>> >>     * {fise:Enhancement} metadata
>>> >>>> >> >> >>> >>> >>
>>> >>>> >> >> >>> >>> >>  * fise:ParticipantAnnotation
>>> >>>> >> >> >>> >>> >>     * {fise:Enhancement} metadata
>>> >>>> >> >> >>> >>> >>     * fise:inSetting {settingAnnotation}
>>> >>>> >> >> >>> >>> >>     * fise:hasMention {textAnnotation}
>>> >>>> >> >> >>> >>> >>     * fise:suggestion {entityAnnotation} (multiple
>>> if
>>> >>>> there
>>> >>>> >> are
>>> >>>> >> >> >>> more
>>> >>>> >> >> >>> >>> >> suggestions)
>>> >>>> >> >> >>> >>> >>     * dc:type one of fise:Agent, fise:Patient,
>>> >>>> >> fise:Instrument,
>>> >>>> >> >> >>> >>> fise:Cause
>>> >>>> >> >> >>> >>> >>
>>> >>>> >> >> >>> >>> >>  * fise:OccurrentAnnotation
>>> >>>> >> >> >>> >>> >>     * {fise:Enhancement} metadata
>>> >>>> >> >> >>> >>> >>     * fise:inSetting {settingAnnotation}
>>> >>>> >> >> >>> >>> >>     * fise:hasMention {textAnnotation}
>>> >>>> >> >> >>> >>> >>     * dc:type set to fise:Activity
>>> >>>> >> >> >>> >>> >>
>>> >>>> >> >> >>> >>> >> If it turns out that we can extract more, we can add
>>> >>>> more
>>> >>>> >> >> >>> structure to
>>> >>>> >> >> >>> >>> >> those annotations. We might also think about using
>>> an
>>> >>>> own
>>> >>>> >> >> namespace
>>> >>>> >> >> >>> >>> >> for those extensions to the annotation structure.
>>> >>>> >> >> >>> >>> >>
>>> >>>> >> >> >>> >>> >> > 2. Determine how should all of this be integrated
>>> into
>>> >>>> >> >> Stanbol.
>>> >>>> >> >> >>> >>> >>
>>> >>>> >> >> >>> >>> >> Just create an EventExtractionEngine and configure a
>>> >>>> >> enhancement
>>> >>>> >> >> >>> chain
>>> >>>> >> >> >>> >>> >> that does NLP processing and EntityLinking.
>>> >>>> >> >> >>> >>> >>
>>> >>>> >> >> >>> >>> >> You should have a look at
>>> >>>> >> >> >>> >>> >>
>>> >>>> >> >> >>> >>> >> * SentimentSummarizationEngine [1] as it does a lot
>>> of
>>> >>>> things
>>> >>>> >> >> with
>>> >>>> >> >> >>> NLP
>>> >>>> >> >> >>> >>> >> processing results (e.g. connecting adjectives (via
>>> >>>> verbs) to
>>> >>>> >> >> >>> >>> >> nouns/pronouns. So as long we can not use explicit
>>> >>>> dependency
>>> >>>> >> >> trees
>>> >>>> >> >> >>> >>> >> you code will need to do similar things with Nouns,
>>> >>>> Pronouns
>>> >>>> >> and
>>> >>>> >> >> >>> >>> >> Verbs.
>>> >>>> >> >> >>> >>> >>
>>> >>>> >> >> >>> >>> >> * Disambigutation-MLT engine, as it creates a Java
>>> >>>> >> >> representation
>>> >>>> >> >> >>> of
>>> >>>> >> >> >>> >>> >> present fise:TextAnnotation and
>>> fise:EntityAnnotation
>>> >>>> [2].
>>> >>>> >> >> >>> Something
>>> >>>> >> >> >>> >>> >> similar will also be required by the
>>> >>>> EventExtractionEngine
>>> >>>> >> for
>>> >>>> >> >> fast
>>> >>>> >> >> >>> >>> >> access to such annotations while iterating over the
>>> >>>> >> Sentences of
>>> >>>> >> >> >>> the
>>> >>>> >> >> >>> >>> >> text.
>>> >>>> >> >> >>> >>> >>
>>> >>>> >> >> >>> >>> >>
>>> >>>> >> >> >>> >>> >> best
>>> >>>> >> >> >>> >>> >> Rupert
>>> >>>> >> >> >>> >>> >>
>>> >>>> >> >> >>> >>> >> [1]
>>> >>>> >> >> >>> >>> >>
>>> >>>> >> >> >>> >>>
>>> >>>> >> >> >>>
>>> >>>> >> >>
>>> >>>> >>
>>> >>>>
>>> https://svn.apache.org/repos/asf/stanbol/trunk/enhancement-engines/sentiment-summarization/src/main/java/org/apache/stanbol/enhancer/engines/sentiment/summarize/SentimentSummarizationEngine.java
>>> >>>> >> >> >>> >>> >> [2]
>>> >>>> >> >> >>> >>> >>
>>> >>>> >> >> >>> >>>
>>> >>>> >> >> >>>
>>> >>>> >> >>
>>> >>>> >>
>>> >>>>
>>> https://svn.apache.org/repos/asf/stanbol/trunk/enhancement-engines/disambiguation-mlt/src/main/java/org/apache/stanbol/enhancer/engine/disambiguation/mlt/DisambiguationData.java
>>> >>>> >> >> >>> >>> >>
>>> >>>> >> >> >>> >>> >> >
>>> >>>> >> >> >>> >>> >> > Thanks
>>> >>>> >> >> >>> >>> >> >
>>> >>>> >> >> >>> >>> >> > Hope this helps to bootstrap this discussion
>>> >>>> >> >> >>> >>> >> >> best
>>> >>>> >> >> >>> >>> >> >> Rupert
>>> >>>> >> >> >>> >>> >> >>
>>> >>>> >> >> >>> >>> >> >> --
>>> >>>> >> >> >>> >>> >> >> | Rupert Westenthaler
>>> >>>> >> >> rupert.westenthaler@gmail.com
>>> >>>> >> >> >>> >>> >> >> | Bodenlehenstraße 11
>>> >>>> >> >> >>> ++43-699-11108907
>>> >>>> >> >> >>> >>> >> >> | A-5500 Bischofshofen
>>> >>>> >> >> >>> >>> >> >>
>>> >>>> >> >> >>> >>> >>
>>> >>>> >> >> >>> >>> >>
>>> >>>> >> >> >>> >>> >>
>>> >>>> >> >> >>> >>> >> --
>>> >>>> >> >> >>> >>> >> | Rupert Westenthaler
>>> >>>> >> rupert.westenthaler@gmail.com
>>> >>>> >> >> >>> >>> >> | Bodenlehenstraße 11
>>> >>>> >> >> >>> ++43-699-11108907
>>> >>>> >> >> >>> >>> >> | A-5500 Bischofshofen
>>> >>>> >> >> >>> >>> >>
>>> >>>> >> >> >>> >>>
>>> >>>> >> >> >>> >>>
>>> >>>> >> >> >>> >>>
>>> >>>> >> >> >>> >>> --
>>> >>>> >> >> >>> >>> | Rupert Westenthaler
>>> >>>> rupert.westenthaler@gmail.com
>>> >>>> >> >> >>> >>> | Bodenlehenstraße 11
>>> >>>> >> >> ++43-699-11108907
>>> >>>> >> >> >>> >>> | A-5500 Bischofshofen
>>> >>>> >> >> >>> >>>
>>> >>>> >> >> >>> >>
>>> >>>> >> >> >>> >>
>>> >>>> >> >> >>>
>>> >>>> >> >> >>>
>>> >>>> >> >> >>>
>>> >>>> >> >> >>> --
>>> >>>> >> >> >>> | Rupert Westenthaler
>>> >>>> rupert.westenthaler@gmail.com
>>> >>>> >> >> >>> | Bodenlehenstraße 11
>>> >>>> ++43-699-11108907
>>> >>>> >> >> >>> | A-5500 Bischofshofen
>>> >>>> >> >> >>>
>>> >>>> >> >> >>
>>> >>>> >> >> >>
>>> >>>> >> >>
>>> >>>> >> >>
>>> >>>> >> >>
>>> >>>> >> >> --
>>> >>>> >> >> | Rupert Westenthaler
>>> rupert.westenthaler@gmail.com
>>> >>>> >> >> | Bodenlehenstraße 11
>>> >>>> ++43-699-11108907
>>> >>>> >> >> | A-5500 Bischofshofen
>>> >>>> >> >>
>>> >>>> >>
>>> >>>> >>
>>> >>>> >>
>>> >>>> >> --
>>> >>>> >> | Rupert Westenthaler             rupert.westenthaler@gmail.com
>>> >>>> >> | Bodenlehenstraße 11
>>> ++43-699-11108907
>>> >>>> >> | A-5500 Bischofshofen
>>> >>>> >>
>>> >>>>
>>> >>>>
>>> >>>>
>>> >>>> --
>>> >>>> | Rupert Westenthaler             rupert.westenthaler@gmail.com
>>> >>>> | Bodenlehenstraße 11                             ++43-699-11108907
>>> >>>> | A-5500 Bischofshofen
>>> >>>>
>>> >>>
>>> >>>
>>> >>
>>>
>>>
>>>
>>> --
>>> | Rupert Westenthaler             rupert.westenthaler@gmail.com
>>> | Bodenlehenstraße 11                             ++43-699-11108907
>>> | A-5500 Bischofshofen
>>>
>>
>>



-- 
| Rupert Westenthaler             rupert.westenthaler@gmail.com
| Bodenlehenstraße 11                             ++43-699-11108907
| A-5500 Bischofshofen

Mime
View raw message