stanbol-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rupert Westenthaler <rupert.westentha...@gmail.com>
Subject Re: Relation extraction feature
Date Mon, 02 Sep 2013 13:47:26 GMT
Hi Cristian,

let me provide some feedback to your proposals:

### Referring other Spans

Both suggested annotations require to link other spans (Sentence,
Chunk or Token). For that we should introduce a JSON element used for
referring those elements and use it for all usages.

In the java model this would allow you to have a reference to the
other Span (Sentence, Chunk, Token). In the serialized form you would
have JSON elements with the "type", "start" and "end" attributes as
those three uniquely identify any span.

Here an example based on the "mention" attribute as defined by the
proposed "org.apache.stanbol.enhancer.nlp.coref.CorefTag"

    ...
    "mentions" : [ {
        "type" : "Token",
        "start": 123 ,
        "end": 130 } ,{
        "type" : "Token",
        "start": 157 ,
        "end": 165 }],
    ...

Similar token links in
"org.apache.stanbol.enhancer.nlp.dependency.DependencyTag" should also
use this model.

### Usage of Controlled Vocabularies

In addition the DependencyTag also seams to use a controlled
vocabulary (e.g. 'nsubj', 'conj_and' ...). In such cases the Stanbol
NLP module tries to define those in some kind of Ontology. For POS
tags we use OLIA ontology [1]. This is important as most NLP
frameworks will use different strings and we need to unify those to
commons IDs so that component that consume those data do not depend on
a specific NLP tool.

Because the usage of Ontologies within Java is not well supported. The
Stanbol NLP module defines Java Enumerations for those Ontologies such
as the POS type enumeration [2].

Both the Java Model as well as the JSON serialization do support both
(1) the lexical tag as used by the NLP tool and (2) the mapped
concept. In the Java API via two different methods and in the JSON
serialization via two separate keys.

To make this more clear here an example for a POS annotation of a proper noun.

    "stanbol.enhancer.nlp.pos" : {
        "tag" : "PN",
        "pos" : 53,
        "class" : "org.apache.stanbol.enhancer.nlp.pos.PosTag",
        "prob" : 0.95
    }

where

    "tag" : "PN"

is the lexical form as used by the NLP tool and

    "pos" : 53

refers to the ordinal number of the entry "ProperNoun" in the POS enumeration

IMO the "type" property of DependencyTag should use a similar design.

best
Rupert

[1] http://olia.nlp2rdf.org/
[2] http://svn.apache.org/repos/asf/stanbol/trunk/enhancer/generic/nlp/src/main/java/org/apache/stanbol/enhancer/nlp/pos/Pos.java

On Sun, Sep 1, 2013 at 8:09 PM, Cristian Petroaca
<cristian.petroaca@gmail.com> wrote:
> Sorry, pressed sent too soon :).
>
> Continued :
>
> nsubj(met-4, Mary-1), conj_and(Mary-1, Tom-3), nsubj(met-4, Tom-3),
> root(ROOT-0, met-4), nn(today-6, Danny-5), tmod(met-4, today-6)]
>
> Given this, we can have for each "Token" an additional dependency
> annotation :
>
> "stanbol.enhancer.nlp.dependency" : {
> "tag" : //is it necessary?
> "relations" : [ { "type" : "nsubj", //type of relation
>   "role" : "gov/dep", //whether it is depender or the dependee
>   "dependencyValue" : "met", // the word with which the token has a relation
>   "dependencyIndexInSentence" : "2" //the index of the dependency in the
> current sentence
> }
> ...
> ]
>                 "class" :
> "org.apache.stanbol.enhancer.nlp.dependency.DependencyTag"
>         }
>
> 2013/9/1 Cristian Petroaca <cristian.petroaca@gmail.com>
>
>> Related to the Stanford Dependency Tree Feature, this is the way the
>> output from the tool looks like for this sentence : "Mary and Tom met Danny
>> today" :
>>
>>
>> 2013/8/30 Cristian Petroaca <cristian.petroaca@gmail.com>
>>
>>> Hi Rupert,
>>>
>>> Ok, so after looking at the JSON output from the Stanford NLP Server and
>>> the coref module I'm thinking I can represent the coreference information
>>> this way:
>>> Each "Token" or "Chunk" will contain an additional coref annotation with
>>> the following structure :
>>>
>>> "stanbol.enhancer.nlp.coref" {
>>>     "tag" : //does this need to exist?
>>>     "isRepresentative" : true/false, // whether this token or chunk is
>>> the representative mention in the chain
>>>     "mentions" : [ { "sentenceNo" : 1 //the sentence in which the mention
>>> is found
>>>                            "startWord" : 2 //the first word making up the
>>> mention
>>>                            "endWord" : 3 //the last word making up the
>>> mention
>>>                          }, ...
>>>                        ],
>>>     "class" : ""class" : "org.apache.stanbol.enhancer.nlp.coref.CorefTag"
>>> }
>>>
>>> The CorefTag should resemble this model.
>>>
>>> What do you think?
>>>
>>> Cristian
>>>
>>>
>>> 2013/8/24 Rupert Westenthaler <rupert.westenthaler@gmail.com>
>>>
>>>> Hi Cristian,
>>>>
>>>> you can not directly call StanfordNLP components from Stanbol, but you
>>>> have to extend the RESTful service to include the information you
>>>> need. The main reason for that is that the license of StanfordNLP is
>>>> not compatible with the Apache Software License. So Stanbol can not
>>>> directly link to the StanfordNLP API.
>>>>
>>>> You will need to
>>>>
>>>> 1. define an additional class {yourTag} extends Tag<{yourType}> class
>>>> in the o.a.s.enhancer.nlp module
>>>> 2. add JSON parsing and serialization support for this tag to the
>>>> o.a.s.enhancer.nlp.json module (see e.g. PosTagSupport as an example)
>>>>
>>>> As (1) would be necessary anyway the only additional thing you need to
>>>> develop is (2). After that you can add {yourTag} instance to the
>>>> AnalyzedText in the StanfornNLP integration. The
>>>> RestfulNlpAnalysisEngine will parse them from the response. All
>>>> engines executed after the RestfulNlpAnalysisEngine will have access
>>>> to your annotations.
>>>>
>>>> If you have a design for {yourTag} - the model you would like to use
>>>> to represent your data - I can help with (1) and (2).
>>>>
>>>> best
>>>> Rupert
>>>>
>>>>
>>>> On Fri, Aug 23, 2013 at 5:11 PM, Cristian Petroaca
>>>> <cristian.petroaca@gmail.com> wrote:
>>>> > Hi Rupert,
>>>> >
>>>> > Thanks for the info. Looking at the standbol-stanfordnlp project I see
>>>> that
>>>> > the stanford nlp is not implemented as an EnhancementEngine but rather
>>>> it
>>>> > is used directly in a Jetty Server instance. How does that fit into the
>>>> > Stanbol stack? For example how can I call the StanfordNlpAnalyzer's
>>>> routine
>>>> > from my TripleExtractionEnhancementEngine which lives in the Stanbol
>>>> stack?
>>>> >
>>>> > Thanks,
>>>> > Cristian
>>>> >
>>>> >
>>>> > 2013/8/12 Rupert Westenthaler <rupert.westenthaler@gmail.com>
>>>> >
>>>> >> Hi Cristian,
>>>> >>
>>>> >> Sorry for the late response, but I was offline for the last two weeks
>>>> >>
>>>> >> On Fri, Aug 2, 2013 at 9:19 PM, Cristian Petroaca
>>>> >> <cristian.petroaca@gmail.com> wrote:
>>>> >> > Hi Rupert,
>>>> >> >
>>>> >> > After doing some tests it seems that the Stanford NLP coreference
>>>> module
>>>> >> is
>>>> >> > much more accurate than the Open NLP one.So I decided to extend
>>>> Stanford
>>>> >> > NLP to add coreference there.
>>>> >>
>>>> >> The Stanford NLP integration is not part of the Stanbol codebase
>>>> >> because the licenses are not compatible.
>>>> >>
>>>> >> You can find the Stanford NLP integration on
>>>> >>
>>>> >>     https://github.com/westei/stanbol-stanfordnlp
>>>> >>
>>>> >> just create a fork and send pull requests.
>>>> >>
>>>> >>
>>>> >> > Could you add the necessary projects on the branch? And also remove
>>>> the
>>>> >> > Open NLP ones?
>>>> >> >
>>>> >>
>>>> >> Currently the branch
>>>> >>
>>>> >>
>>>> >>
>>>> http://svn.apache.org/repos/asf/stanbol/branches/nlp-dep-tree-and-co-ref/
>>>> >>
>>>> >> only contains the "nlp" and the "nlp-json" modules. IMO those should
>>>> >> be enough for adding coreference support.
>>>> >>
>>>> >> IMO you will need to
>>>> >>
>>>> >> * add an model for representing coreference to the nlp module
>>>> >> * add parsing and serializing support to the nlp-json module
>>>> >> * add the implementation to your fork of the stanbol-stanfordnlp
>>>> project
>>>> >>
>>>> >> best
>>>> >> Rupert
>>>> >>
>>>> >>
>>>> >>
>>>> >> > Thanks,
>>>> >> > Cristian
>>>> >> >
>>>> >> >
>>>> >> > 2013/7/5 Rupert Westenthaler <rupert.westenthaler@gmail.com>
>>>> >> >
>>>> >> >> Hi Cristian,
>>>> >> >>
>>>> >> >> I created the branch at
>>>> >> >>
>>>> >> >>
>>>> >> >>
>>>> >>
>>>> http://svn.apache.org/repos/asf/stanbol/branches/nlp-dep-tree-and-co-ref/
>>>> >> >>
>>>> >> >> ATM in contains only the "nlp" and "nlp-json" module. Let me know
>>>> if
>>>> >> >> you would like to have more
>>>> >> >>
>>>> >> >> best
>>>> >> >> Rupert
>>>> >> >>
>>>> >> >>
>>>> >> >>
>>>> >> >> On Thu, Jul 4, 2013 at 10:14 AM, Cristian Petroaca
>>>> >> >> <cristian.petroaca@gmail.com> wrote:
>>>> >> >> > Hi Rupert,
>>>> >> >> >
>>>> >> >> > I created jiras :
>>>> https://issues.apache.org/jira/browse/STANBOL-1132and
>>>> >> >> > https://issues.apache.org/jira/browse/STANBOL-1133. The
>>>> original one
>>>> >> in
>>>> >> >> > dependent upon these.
>>>> >> >> > Please let me know when I can start using the branch.
>>>> >> >> >
>>>> >> >> > Thanks,
>>>> >> >> > Cristian
>>>> >> >> >
>>>> >> >> >
>>>> >> >> > 2013/6/27 Cristian Petroaca <cristian.petroaca@gmail.com>
>>>> >> >> >
>>>> >> >> >>
>>>> >> >> >>
>>>> >> >> >>
>>>> >> >> >> 2013/6/27 Rupert Westenthaler <rupert.westenthaler@gmail.com>
>>>> >> >> >>
>>>> >> >> >>> On Thu, Jun 27, 2013 at 3:12 PM, Cristian Petroaca
>>>> >> >> >>> <cristian.petroaca@gmail.com> wrote:
>>>> >> >> >>> > Sorry, I meant the Stanbol NLP API, not Stanford in my
>>>> previous
>>>> >> >> e-mail.
>>>> >> >> >>> By
>>>> >> >> >>> > the way, does Open NLP have the ability to build dependency
>>>> trees?
>>>> >> >> >>> >
>>>> >> >> >>>
>>>> >> >> >>> AFAIK OpenNLP does not provide this feature.
>>>> >> >> >>>
>>>> >> >> >>
>>>> >> >> >> Then , since the Stanford NLP lib is also integrated into
>>>> Stanbol,
>>>> >> I'll
>>>> >> >> >> take a look at how I can extend its integration to include the
>>>> >> >> dependency
>>>> >> >> >> tree feature.
>>>> >> >> >>
>>>> >> >> >>>
>>>> >> >> >>>
>>>> >> >> >>  >
>>>> >> >> >>> > 2013/6/23 Cristian Petroaca <cristian.petroaca@gmail.com>
>>>> >> >> >>> >
>>>> >> >> >>> >> Hi Rupert,
>>>> >> >> >>> >>
>>>> >> >> >>> >> I created jira
>>>> >> https://issues.apache.org/jira/browse/STANBOL-1121.
>>>> >> >> >>> >> As you suggested I would start with extending the Stanford
>>>> NLP
>>>> >> with
>>>> >> >> >>> >> co-reference resolution but I think also with dependency
>>>> trees
>>>> >> >> because
>>>> >> >> >>> I
>>>> >> >> >>> >> also need to know the Subject of the sentence and the object
>>>> >> that it
>>>> >> >> >>> >> affects, right?
>>>> >> >> >>> >>
>>>> >> >> >>> >> Given that I need to extend the Stanford NLP API in Stanbol
>>>> for
>>>> >> >> >>> >> co-reference and dependency trees, how do I proceed with
>>>> this?
>>>> >> Do I
>>>> >> >> >>> create
>>>> >> >> >>> >> 2 new sub-tasks to the already opened Jira? After that can I
>>>> >> start
>>>> >> >> >>> >> implementing on my local copy of Stanbol and when I'm done
>>>> I'll
>>>> >> send
>>>> >> >> >>> you
>>>> >> >> >>> >> guys the patch fo review?
>>>> >> >> >>> >>
>>>> >> >> >>>
>>>> >> >> >>> I would create two "New Feature" type Issues one for adding
>>>> support
>>>> >> >> >>> for "dependency trees" and the other for "co-reference"
>>>> support. You
>>>> >> >> >>> should also define "depends on" relations between STANBOL-1121
>>>> and
>>>> >> >> >>> those two new issues.
>>>> >> >> >>>
>>>> >> >> >>> Sub-task could also work, but as adding those features would
>>>> be also
>>>> >> >> >>> interesting for other things I would rather define them as
>>>> separate
>>>> >> >> >>> issues.
>>>> >> >> >>>
>>>> >> >> >>>
>>>> >> >> >> 2 New Features connected with the original jira it is then.
>>>> >> >> >>
>>>> >> >> >>
>>>> >> >> >>> If you would prefer to work in an own branch please tell me.
>>>> This
>>>> >> >> >>> could have the advantage that patches would not be affected by
>>>> >> changes
>>>> >> >> >>> in the trunk.
>>>> >> >> >>>
>>>> >> >> >>> Yes, a separate branch sounds good.
>>>> >> >> >>
>>>> >> >> >> best
>>>> >> >> >>> Rupert
>>>> >> >> >>>
>>>> >> >> >>> >> Regards,
>>>> >> >> >>> >> Cristian
>>>> >> >> >>> >>
>>>> >> >> >>> >>
>>>> >> >> >>> >> 2013/6/18 Rupert Westenthaler <
>>>> rupert.westenthaler@gmail.com>
>>>> >> >> >>> >>
>>>> >> >> >>> >>> On Mon, Jun 17, 2013 at 10:18 PM, Cristian Petroaca
>>>> >> >> >>> >>> <cristian.petroaca@gmail.com> wrote:
>>>> >> >> >>> >>> > Hi Rupert,
>>>> >> >> >>> >>> >
>>>> >> >> >>> >>> > Agreed on the
>>>> >> >> >>> SettingAnnotation/ParticipantAnnotation/OccurentAnnotation
>>>> >> >> >>> >>> > data structure.
>>>> >> >> >>> >>> >
>>>> >> >> >>> >>> > Should I open up a Jira for all of this in order to
>>>> >> encapsulate
>>>> >> >> this
>>>> >> >> >>> >>> > information and establish the goals and these initial
>>>> steps
>>>> >> >> towards
>>>> >> >> >>> >>> these
>>>> >> >> >>> >>> > goals?
>>>> >> >> >>> >>>
>>>> >> >> >>> >>> Yes please. A JIRA issue for this work would be great.
>>>> >> >> >>> >>>
>>>> >> >> >>> >>> > How should I proceed further? Should I create some design
>>>> >> >> documents
>>>> >> >> >>> that
>>>> >> >> >>> >>> > need to be reviewed?
>>>> >> >> >>> >>>
>>>> >> >> >>> >>> Usually it is the best to write design related text
>>>> directly in
>>>> >> >> JIRA
>>>> >> >> >>> >>> by using Markdown [1] syntax. This will allow us later to
>>>> use
>>>> >> this
>>>> >> >> >>> >>> text directly for the documentation on the Stanbol Webpage.
>>>> >> >> >>> >>>
>>>> >> >> >>> >>> best
>>>> >> >> >>> >>> Rupert
>>>> >> >> >>> >>>
>>>> >> >> >>> >>>
>>>> >> >> >>> >>> [1] http://daringfireball.net/projects/markdown/
>>>> >> >> >>> >>> >
>>>> >> >> >>> >>> > Regards,
>>>> >> >> >>> >>> > Cristian
>>>> >> >> >>> >>> >
>>>> >> >> >>> >>> >
>>>> >> >> >>> >>> > 2013/6/17 Rupert Westenthaler <
>>>> rupert.westenthaler@gmail.com>
>>>> >> >> >>> >>> >
>>>> >> >> >>> >>> >> On Thu, Jun 13, 2013 at 8:22 PM, Cristian Petroaca
>>>> >> >> >>> >>> >> <cristian.petroaca@gmail.com> wrote:
>>>> >> >> >>> >>> >> > HI Rupert,
>>>> >> >> >>> >>> >> >
>>>> >> >> >>> >>> >> > First of all thanks for the detailed suggestions.
>>>> >> >> >>> >>> >> >
>>>> >> >> >>> >>> >> > 2013/6/12 Rupert Westenthaler <
>>>> >> rupert.westenthaler@gmail.com>
>>>> >> >> >>> >>> >> >
>>>> >> >> >>> >>> >> >> Hi Cristian, all
>>>> >> >> >>> >>> >> >>
>>>> >> >> >>> >>> >> >> really interesting use case!
>>>> >> >> >>> >>> >> >>
>>>> >> >> >>> >>> >> >> In this mail I will try to give some suggestions on
>>>> how
>>>> >> this
>>>> >> >> >>> could
>>>> >> >> >>> >>> >> >> work out. This suggestions are mainly based on
>>>> experiences
>>>> >> >> and
>>>> >> >> >>> >>> lessons
>>>> >> >> >>> >>> >> >> learned in the LIVE [2] project where we built an
>>>> >> information
>>>> >> >> >>> system
>>>> >> >> >>> >>> >> >> for the Olympic Games in Peking. While this Project
>>>> >> excluded
>>>> >> >> the
>>>> >> >> >>> >>> >> >> extraction of Events from unstructured text (because
>>>> the
>>>> >> >> Olympic
>>>> >> >> >>> >>> >> >> Information System was already providing event data
>>>> as XML
>>>> >> >> >>> messages)
>>>> >> >> >>> >>> >> >> the semantic search capabilities of this system
>>>> where very
>>>> >> >> >>> similar
>>>> >> >> >>> >>> as
>>>> >> >> >>> >>> >> >> the one described by your use case.
>>>> >> >> >>> >>> >> >>
>>>> >> >> >>> >>> >> >> IMHO you are not only trying to extract relations,
>>>> but a
>>>> >> >> formal
>>>> >> >> >>> >>> >> >> representation of the situation described by the
>>>> text. So
>>>> >> >> lets
>>>> >> >> >>> >>> assume
>>>> >> >> >>> >>> >> >> that the goal is to Annotate a Setting (or Situation)
>>>> >> >> described
>>>> >> >> >>> in
>>>> >> >> >>> >>> the
>>>> >> >> >>> >>> >> >> text - a fise:SettingAnnotation.
>>>> >> >> >>> >>> >> >>
>>>> >> >> >>> >>> >> >> The DOLCE foundational ontology [1] gives some
>>>> advices on
>>>> >> >> how to
>>>> >> >> >>> >>> model
>>>> >> >> >>> >>> >> >> those. The important relation for modeling this
>>>> >> >> Participation:
>>>> >> >> >>> >>> >> >>
>>>> >> >> >>> >>> >> >>     PC(x, y, t) → (ED(x) ∧ PD(y) ∧ T(t))
>>>> >> >> >>> >>> >> >>
>>>> >> >> >>> >>> >> >> where ..
>>>> >> >> >>> >>> >> >>
>>>> >> >> >>> >>> >> >>  * ED are Endurants (continuants): Endurants do have
>>>> an
>>>> >> >> >>> identity so
>>>> >> >> >>> >>> we
>>>> >> >> >>> >>> >> >> would typically refer to them as Entities referenced
>>>> by a
>>>> >> >> >>> setting.
>>>> >> >> >>> >>> >> >> Note that this includes physical, non-physical as
>>>> well as
>>>> >> >> >>> >>> >> >> social-objects.
>>>> >> >> >>> >>> >> >>  * PD are Perdurants (occurrents):  Perdurants are
>>>> >> entities
>>>> >> >> that
>>>> >> >> >>> >>> >> >> happen in time. This refers to Events, Activities ...
>>>> >> >> >>> >>> >> >>  * PC are Participation: It is an time indexed
>>>> relation
>>>> >> where
>>>> >> >> >>> >>> >> >> Endurants participate in Perdurants
>>>> >> >> >>> >>> >> >>
>>>> >> >> >>> >>> >> >> Modeling this in RDF requires to define some
>>>> intermediate
>>>> >> >> >>> resources
>>>> >> >> >>> >>> >> >> because RDF does not allow for n-ary relations.
>>>> >> >> >>> >>> >> >>
>>>> >> >> >>> >>> >> >>  * fise:SettingAnnotation: It is really handy to
>>>> define
>>>> >> one
>>>> >> >> >>> resource
>>>> >> >> >>> >>> >> >> being the context for all described data. I would
>>>> call
>>>> >> this
>>>> >> >> >>> >>> >> >> "fise:SettingAnnotation" and define it as a
>>>> sub-concept to
>>>> >> >> >>> >>> >> >> fise:Enhancement. All further enhancement about the
>>>> >> extracted
>>>> >> >> >>> >>> Setting
>>>> >> >> >>> >>> >> >> would define a "fise:in-setting" relation to it.
>>>> >> >> >>> >>> >> >>
>>>> >> >> >>> >>> >> >>  * fise:ParticipantAnnotation: Is used to annotate
>>>> that
>>>> >> >> >>> Endurant is
>>>> >> >> >>> >>> >> >> participating on a setting (fise:in-setting
>>>> >> >> >>> fise:SettingAnnotation).
>>>> >> >> >>> >>> >> >> The Endurant itself is described by existing
>>>> >> >> fise:TextAnnotaion
>>>> >> >> >>> (the
>>>> >> >> >>> >>> >> >> mentions) and fise:EntityAnnotation (suggested
>>>> Entities).
>>>> >> >> >>> Basically
>>>> >> >> >>> >>> >> >> the fise:ParticipantAnnotation will allow an
>>>> >> >> EnhancementEngine
>>>> >> >> >>> to
>>>> >> >> >>> >>> >> >> state that several mentions (in possible different
>>>> >> >> sentences) do
>>>> >> >> >>> >>> >> >> represent the same Endurant as participating in the
>>>> >> Setting.
>>>> >> >> In
>>>> >> >> >>> >>> >> >> addition it would be possible to use the dc:type
>>>> property
>>>> >> >> >>> (similar
>>>> >> >> >>> >>> as
>>>> >> >> >>> >>> >> >> for fise:TextAnnotation) to refer to the role(s) of
>>>> an
>>>> >> >> >>> participant
>>>> >> >> >>> >>> >> >> (e.g. the set: Agent (intensionally performs an
>>>> action)
>>>> >> Cause
>>>> >> >> >>> >>> >> >> (unintentionally e.g. a mud slide), Patient (a
>>>> passive
>>>> >> role
>>>> >> >> in
>>>> >> >> >>> an
>>>> >> >> >>> >>> >> >> activity) and Instrument (aids an process)), but I am
>>>> >> >> wondering
>>>> >> >> >>> if
>>>> >> >> >>> >>> one
>>>> >> >> >>> >>> >> >> could extract those information.
>>>> >> >> >>> >>> >> >>
>>>> >> >> >>> >>> >> >> * fise:OccurrentAnnotation: is used to annotate a
>>>> >> Perdurant
>>>> >> >> in
>>>> >> >> >>> the
>>>> >> >> >>> >>> >> >> context of the Setting. Also
>>>> fise:OccurrentAnnotation can
>>>> >> >> link
>>>> >> >> >>> to
>>>> >> >> >>> >>> >> >> fise:TextAnnotaion (typically verbs in the text
>>>> defining
>>>> >> the
>>>> >> >> >>> >>> >> >> perdurant) as well as fise:EntityAnnotation
>>>> suggesting
>>>> >> well
>>>> >> >> >>> known
>>>> >> >> >>> >>> >> >> Events in a knowledge base (e.g. a Election in a
>>>> country,
>>>> >> or
>>>> >> >> an
>>>> >> >> >>> >>> >> >> upraising ...). In addition fise:OccurrentAnnotation
>>>> can
>>>> >> >> define
>>>> >> >> >>> >>> >> >> dc:has-participant links to
>>>> fise:ParticipantAnnotation. In
>>>> >> >> this
>>>> >> >> >>> case
>>>> >> >> >>> >>> >> >> it is explicitly stated hat an Endurant (the
>>>> >> >> >>> >>> >> >> fise:ParticipantAnnotation) involved in this
>>>> Perturant
>>>> >> (the
>>>> >> >> >>> >>> >> >> fise:OccurrentAnnotation). As Occurrences are
>>>> temporal
>>>> >> >> indexed
>>>> >> >> >>> this
>>>> >> >> >>> >>> >> >> annotation should also support properties for
>>>> defining the
>>>> >> >> >>> >>> >> >> xsd:dateTime for the start/end.
>>>> >> >> >>> >>> >> >>
>>>> >> >> >>> >>> >> >>
>>>> >> >> >>> >>> >> >> Indeed, an event based data structure makes a lot of
>>>> sense
>>>> >> >> with
>>>> >> >> >>> the
>>>> >> >> >>> >>> >> remark
>>>> >> >> >>> >>> >> > that you probably won't be able to always extract the
>>>> date
>>>> >> >> for a
>>>> >> >> >>> >>> given
>>>> >> >> >>> >>> >> > setting(situation).
>>>> >> >> >>> >>> >> > There are 2 thing which are unclear though.
>>>> >> >> >>> >>> >> >
>>>> >> >> >>> >>> >> > 1. Perdurant : You could have situations in which the
>>>> >> object
>>>> >> >> upon
>>>> >> >> >>> >>> which
>>>> >> >> >>> >>> >> the
>>>> >> >> >>> >>> >> > Subject ( or Endurant ) is acting is not a transitory
>>>> >> object (
>>>> >> >> >>> such
>>>> >> >> >>> >>> as an
>>>> >> >> >>> >>> >> > event, activity ) but rather another Endurant. For
>>>> example
>>>> >> we
>>>> >> >> can
>>>> >> >> >>> >>> have
>>>> >> >> >>> >>> >> the
>>>> >> >> >>> >>> >> > phrase "USA invades Irak" where "USA" is the Endurant
>>>> (
>>>> >> >> Subject )
>>>> >> >> >>> >>> which
>>>> >> >> >>> >>> >> > performs the action of "invading" on another
>>>> Eundurant,
>>>> >> namely
>>>> >> >> >>> >>> "Irak".
>>>> >> >> >>> >>> >> >
>>>> >> >> >>> >>> >>
>>>> >> >> >>> >>> >> By using CAOS, USA would be the Agent and Iraq the
>>>> Patient.
>>>> >> Both
>>>> >> >> >>> are
>>>> >> >> >>> >>> >> Endurants. The activity "invading" would be the
>>>> Perdurant. So
>>>> >> >> >>> ideally
>>>> >> >> >>> >>> >> you would have a  "fise:SettingAnnotation" with:
>>>> >> >> >>> >>> >>
>>>> >> >> >>> >>> >>   * fise:ParticipantAnnotation for USA with the dc:type
>>>> >> >> caos:Agent,
>>>> >> >> >>> >>> >> linking to a fise:TextAnnotation for "USA" and a
>>>> >> >> >>> fise:EntityAnnotation
>>>> >> >> >>> >>> >> linking to dbpedia:United_States
>>>> >> >> >>> >>> >>   * fise:ParticipantAnnotation for Iraq with the dc:type
>>>> >> >> >>> caos:Patient,
>>>> >> >> >>> >>> >> linking to a fise:TextAnnotation for "Irak" and a
>>>> >> >> >>> >>> >> fise:EntityAnnotation linking to  dbpedia:Iraq
>>>> >> >> >>> >>> >>   * fise:OccurrentAnnotation for "invades" with the
>>>> dc:type
>>>> >> >> >>> >>> >> caos:Activity, linking to a fise:TextAnnotation for
>>>> "invades"
>>>> >> >> >>> >>> >>
>>>> >> >> >>> >>> >> > 2. Where does the verb, which links the Subject and
>>>> the
>>>> >> Object
>>>> >> >> >>> come
>>>> >> >> >>> >>> into
>>>> >> >> >>> >>> >> > this? I imagined that the Endurant would have a
>>>> >> dc:"property"
>>>> >> >> >>> where
>>>> >> >> >>> >>> the
>>>> >> >> >>> >>> >> > property = verb which links to the Object in noun
>>>> form. For
>>>> >> >> >>> example
>>>> >> >> >>> >>> take
>>>> >> >> >>> >>> >> > again the sentence "USA invades Irak". You would have
>>>> the
>>>> >> >> "USA"
>>>> >> >> >>> >>> Entity
>>>> >> >> >>> >>> >> with
>>>> >> >> >>> >>> >> > dc:invader which points to the Object "Irak". The
>>>> Endurant
>>>> >> >> would
>>>> >> >> >>> >>> have as
>>>> >> >> >>> >>> >> > many dc:"property" elements as there are verbs which
>>>> link
>>>> >> it
>>>> >> >> to
>>>> >> >> >>> an
>>>> >> >> >>> >>> >> Object.
>>>> >> >> >>> >>> >>
>>>> >> >> >>> >>> >> As explained above you would have a
>>>> fise:OccurrentAnnotation
>>>> >> >> that
>>>> >> >> >>> >>> >> represents the Perdurant. The information that the
>>>> activity
>>>> >> >> >>> mention in
>>>> >> >> >>> >>> >> the text is "invades" would be by linking to a
>>>> >> >> >>> fise:TextAnnotation. If
>>>> >> >> >>> >>> >> you can also provide an Ontology for Tasks that defines
>>>> >> >> >>> >>> >> "myTasks:invade" the fise:OccurrentAnnotation could
>>>> also link
>>>> >> >> to an
>>>> >> >> >>> >>> >> fise:EntityAnnotation for this concept.
>>>> >> >> >>> >>> >>
>>>> >> >> >>> >>> >> best
>>>> >> >> >>> >>> >> Rupert
>>>> >> >> >>> >>> >>
>>>> >> >> >>> >>> >> >
>>>> >> >> >>> >>> >> > ### Consuming the data:
>>>> >> >> >>> >>> >> >>
>>>> >> >> >>> >>> >> >> I think this model should be sufficient for
>>>> use-cases as
>>>> >> >> >>> described
>>>> >> >> >>> >>> by
>>>> >> >> >>> >>> >> you.
>>>> >> >> >>> >>> >> >>
>>>> >> >> >>> >>> >> >> Users would be able to consume data on the setting
>>>> level.
>>>> >> >> This
>>>> >> >> >>> can
>>>> >> >> >>> >>> be
>>>> >> >> >>> >>> >> >> done my simple retrieving all
>>>> fise:ParticipantAnnotation
>>>> >> as
>>>> >> >> >>> well as
>>>> >> >> >>> >>> >> >> fise:OccurrentAnnotation linked with a setting. BTW
>>>> this
>>>> >> was
>>>> >> >> the
>>>> >> >> >>> >>> >> >> approach used in LIVE [2] for semantic search. It
>>>> allows
>>>> >> >> >>> queries for
>>>> >> >> >>> >>> >> >> Settings that involve specific Entities e.g. you
>>>> could
>>>> >> filter
>>>> >> >> >>> for
>>>> >> >> >>> >>> >> >> Settings that involve a {Person},
>>>> activities:Arrested and
>>>> >> a
>>>> >> >> >>> specific
>>>> >> >> >>> >>> >> >> {Upraising}. However note that with this approach
>>>> you will
>>>> >> >> get
>>>> >> >> >>> >>> results
>>>> >> >> >>> >>> >> >> for Setting where the {Person} participated and an
>>>> other
>>>> >> >> person
>>>> >> >> >>> was
>>>> >> >> >>> >>> >> >> arrested.
>>>> >> >> >>> >>> >> >>
>>>> >> >> >>> >>> >> >> An other possibility would be to process enhancement
>>>> >> results
>>>> >> >> on
>>>> >> >> >>> the
>>>> >> >> >>> >>> >> >> fise:OccurrentAnnotation. This would allow to a much
>>>> >> higher
>>>> >> >> >>> >>> >> >> granularity level (e.g. it would allow to correctly
>>>> answer
>>>> >> >> the
>>>> >> >> >>> query
>>>> >> >> >>> >>> >> >> used as an example above). But I am wondering if the
>>>> >> quality
>>>> >> >> of
>>>> >> >> >>> the
>>>> >> >> >>> >>> >> >> Setting extraction will be sufficient for this. I
>>>> have
>>>> >> also
>>>> >> >> >>> doubts
>>>> >> >> >>> >>> if
>>>> >> >> >>> >>> >> >> this can be still realized by using semantic
>>>> indexing to
>>>> >> >> Apache
>>>> >> >> >>> Solr
>>>> >> >> >>> >>> >> >> or if it would be better/necessary to store results
>>>> in a
>>>> >> >> >>> TripleStore
>>>> >> >> >>> >>> >> >> and using SPARQL for retrieval.
>>>> >> >> >>> >>> >> >>
>>>> >> >> >>> >>> >> >> The methodology and query language used by YAGO [3]
>>>> is
>>>> >> also
>>>> >> >> very
>>>> >> >> >>> >>> >> >> relevant for this (especially note chapter 7 SPOTL(X)
>>>> >> >> >>> >>> Representation).
>>>> >> >> >>> >>> >> >>
>>>> >> >> >>> >>> >> >> An other related Topic is the enrichment of Entities
>>>> >> >> (especially
>>>> >> >> >>> >>> >> >> Events) in knowledge bases based on Settings
>>>> extracted
>>>> >> form
>>>> >> >> >>> >>> Documents.
>>>> >> >> >>> >>> >> >> As per definition - in DOLCE - Perdurants are
>>>> temporal
>>>> >> >> indexed.
>>>> >> >> >>> That
>>>> >> >> >>> >>> >> >> means that at the time when added to a knowledge
>>>> base they
>>>> >> >> might
>>>> >> >> >>> >>> still
>>>> >> >> >>> >>> >> >> be in process. So the creation, enriching and
>>>> refinement
>>>> >> of
>>>> >> >> such
>>>> >> >> >>> >>> >> >> Entities in a the knowledge base seams to be
>>>> critical for
>>>> >> a
>>>> >> >> >>> System
>>>> >> >> >>> >>> >> >> like described in your use-case.
>>>> >> >> >>> >>> >> >>
>>>> >> >> >>> >>> >> >> On Tue, Jun 11, 2013 at 9:09 PM, Cristian Petroaca
>>>> >> >> >>> >>> >> >> <cristian.petroaca@gmail.com> wrote:
>>>> >> >> >>> >>> >> >> >
>>>> >> >> >>> >>> >> >> > First of all I have to mention that I am new in the
>>>> >> field
>>>> >> >> of
>>>> >> >> >>> >>> semantic
>>>> >> >> >>> >>> >> >> > technologies, I've started to read about them in
>>>> the
>>>> >> last
>>>> >> >> 4-5
>>>> >> >> >>> >>> >> >> months.Having
>>>> >> >> >>> >>> >> >> > said that I have a high level overview of what is
>>>> a good
>>>> >> >> >>> approach
>>>> >> >> >>> >>> to
>>>> >> >> >>> >>> >> >> solve
>>>> >> >> >>> >>> >> >> > this problem. There are a number of papers on the
>>>> >> internet
>>>> >> >> >>> which
>>>> >> >> >>> >>> >> describe
>>>> >> >> >>> >>> >> >> > what steps need to be taken such as : named entity
>>>> >> >> >>> recognition,
>>>> >> >> >>> >>> >> >> > co-reference resolution, pos tagging and others.
>>>> >> >> >>> >>> >> >>
>>>> >> >> >>> >>> >> >> The Stanbol NLP processing module currently only
>>>> supports
>>>> >> >> >>> sentence
>>>> >> >> >>> >>> >> >> detection, tokenization, POS tagging, Chunking, NER
>>>> and
>>>> >> >> lemma.
>>>> >> >> >>> >>> support
>>>> >> >> >>> >>> >> >> for co-reference resolution and dependency trees is
>>>> >> currently
>>>> >> >> >>> >>> missing.
>>>> >> >> >>> >>> >> >>
>>>> >> >> >>> >>> >> >> Stanford NLP is already integrated with Stanbol [4].
>>>> At
>>>> >> the
>>>> >> >> >>> moment
>>>> >> >> >>> >>> it
>>>> >> >> >>> >>> >> >> only supports English, but I do already work to
>>>> include
>>>> >> the
>>>> >> >> >>> other
>>>> >> >> >>> >>> >> >> supported languages. Other NLP framework that is
>>>> already
>>>> >> >> >>> integrated
>>>> >> >> >>> >>> >> >> with Stanbol are Freeling [5] and Talismane [6]. But
>>>> note
>>>> >> >> that
>>>> >> >> >>> for
>>>> >> >> >>> >>> all
>>>> >> >> >>> >>> >> >> those the integration excludes support for
>>>> co-reference
>>>> >> and
>>>> >> >> >>> >>> dependency
>>>> >> >> >>> >>> >> >> trees.
>>>> >> >> >>> >>> >> >>
>>>> >> >> >>> >>> >> >> Anyways I am confident that one can implement a first
>>>> >> >> prototype
>>>> >> >> >>> by
>>>> >> >> >>> >>> >> >> only using Sentences and POS tags and - if available
>>>> -
>>>> >> Chunks
>>>> >> >> >>> (e.g.
>>>> >> >> >>> >>> >> >> Noun phrases).
>>>> >> >> >>> >>> >> >>
>>>> >> >> >>> >>> >> >>
>>>> >> >> >>> >>> >> > I assume that in the Stanbol context, a feature like
>>>> >> Relation
>>>> >> >> >>> >>> extraction
>>>> >> >> >>> >>> >> > would be implemented as an EnhancementEngine?
>>>> >> >> >>> >>> >> > What kind of effort would be required for a
>>>> co-reference
>>>> >> >> >>> resolution
>>>> >> >> >>> >>> tool
>>>> >> >> >>> >>> >> > integration into Stanbol?
>>>> >> >> >>> >>> >> >
>>>> >> >> >>> >>> >>
>>>> >> >> >>> >>> >> Yes in the end it would be an EnhancementEngine. But
>>>> before
>>>> >> we
>>>> >> >> can
>>>> >> >> >>> >>> >> build such an engine we would need to
>>>> >> >> >>> >>> >>
>>>> >> >> >>> >>> >> * extend the Stanbol NLP processing API with
>>>> Annotations for
>>>> >> >> >>> >>> co-reference
>>>> >> >> >>> >>> >> * add support for JSON Serialisation/Parsing for those
>>>> >> >> annotation
>>>> >> >> >>> so
>>>> >> >> >>> >>> >> that the RESTful NLP Analysis Service can provide
>>>> >> co-reference
>>>> >> >> >>> >>> >> information
>>>> >> >> >>> >>> >>
>>>> >> >> >>> >>> >> > At this moment I'll be focusing on 2 aspects:
>>>> >> >> >>> >>> >> >
>>>> >> >> >>> >>> >> > 1. Determine the best data structure to encapsulate
>>>> the
>>>> >> >> extracted
>>>> >> >> >>> >>> >> > information. I'll take a closer look at Dolce.
>>>> >> >> >>> >>> >>
>>>> >> >> >>> >>> >> Don't make to to complex. Defining a proper structure to
>>>> >> >> represent
>>>> >> >> >>> >>> >> Events will only pay-off if we can also successfully
>>>> extract
>>>> >> >> such
>>>> >> >> >>> >>> >> information form processed texts.
>>>> >> >> >>> >>> >>
>>>> >> >> >>> >>> >> I would start with
>>>> >> >> >>> >>> >>
>>>> >> >> >>> >>> >>  * fise:SettingAnnotation
>>>> >> >> >>> >>> >>     * {fise:Enhancement} metadata
>>>> >> >> >>> >>> >>
>>>> >> >> >>> >>> >>  * fise:ParticipantAnnotation
>>>> >> >> >>> >>> >>     * {fise:Enhancement} metadata
>>>> >> >> >>> >>> >>     * fise:inSetting {settingAnnotation}
>>>> >> >> >>> >>> >>     * fise:hasMention {textAnnotation}
>>>> >> >> >>> >>> >>     * fise:suggestion {entityAnnotation} (multiple if
>>>> there
>>>> >> are
>>>> >> >> >>> more
>>>> >> >> >>> >>> >> suggestions)
>>>> >> >> >>> >>> >>     * dc:type one of fise:Agent, fise:Patient,
>>>> >> fise:Instrument,
>>>> >> >> >>> >>> fise:Cause
>>>> >> >> >>> >>> >>
>>>> >> >> >>> >>> >>  * fise:OccurrentAnnotation
>>>> >> >> >>> >>> >>     * {fise:Enhancement} metadata
>>>> >> >> >>> >>> >>     * fise:inSetting {settingAnnotation}
>>>> >> >> >>> >>> >>     * fise:hasMention {textAnnotation}
>>>> >> >> >>> >>> >>     * dc:type set to fise:Activity
>>>> >> >> >>> >>> >>
>>>> >> >> >>> >>> >> If it turns out that we can extract more, we can add
>>>> more
>>>> >> >> >>> structure to
>>>> >> >> >>> >>> >> those annotations. We might also think about using an
>>>> own
>>>> >> >> namespace
>>>> >> >> >>> >>> >> for those extensions to the annotation structure.
>>>> >> >> >>> >>> >>
>>>> >> >> >>> >>> >> > 2. Determine how should all of this be integrated into
>>>> >> >> Stanbol.
>>>> >> >> >>> >>> >>
>>>> >> >> >>> >>> >> Just create an EventExtractionEngine and configure a
>>>> >> enhancement
>>>> >> >> >>> chain
>>>> >> >> >>> >>> >> that does NLP processing and EntityLinking.
>>>> >> >> >>> >>> >>
>>>> >> >> >>> >>> >> You should have a look at
>>>> >> >> >>> >>> >>
>>>> >> >> >>> >>> >> * SentimentSummarizationEngine [1] as it does a lot of
>>>> things
>>>> >> >> with
>>>> >> >> >>> NLP
>>>> >> >> >>> >>> >> processing results (e.g. connecting adjectives (via
>>>> verbs) to
>>>> >> >> >>> >>> >> nouns/pronouns. So as long we can not use explicit
>>>> dependency
>>>> >> >> trees
>>>> >> >> >>> >>> >> you code will need to do similar things with Nouns,
>>>> Pronouns
>>>> >> and
>>>> >> >> >>> >>> >> Verbs.
>>>> >> >> >>> >>> >>
>>>> >> >> >>> >>> >> * Disambigutation-MLT engine, as it creates a Java
>>>> >> >> representation
>>>> >> >> >>> of
>>>> >> >> >>> >>> >> present fise:TextAnnotation and fise:EntityAnnotation
>>>> [2].
>>>> >> >> >>> Something
>>>> >> >> >>> >>> >> similar will also be required by the
>>>> EventExtractionEngine
>>>> >> for
>>>> >> >> fast
>>>> >> >> >>> >>> >> access to such annotations while iterating over the
>>>> >> Sentences of
>>>> >> >> >>> the
>>>> >> >> >>> >>> >> text.
>>>> >> >> >>> >>> >>
>>>> >> >> >>> >>> >>
>>>> >> >> >>> >>> >> best
>>>> >> >> >>> >>> >> Rupert
>>>> >> >> >>> >>> >>
>>>> >> >> >>> >>> >> [1]
>>>> >> >> >>> >>> >>
>>>> >> >> >>> >>>
>>>> >> >> >>>
>>>> >> >>
>>>> >>
>>>> https://svn.apache.org/repos/asf/stanbol/trunk/enhancement-engines/sentiment-summarization/src/main/java/org/apache/stanbol/enhancer/engines/sentiment/summarize/SentimentSummarizationEngine.java
>>>> >> >> >>> >>> >> [2]
>>>> >> >> >>> >>> >>
>>>> >> >> >>> >>>
>>>> >> >> >>>
>>>> >> >>
>>>> >>
>>>> https://svn.apache.org/repos/asf/stanbol/trunk/enhancement-engines/disambiguation-mlt/src/main/java/org/apache/stanbol/enhancer/engine/disambiguation/mlt/DisambiguationData.java
>>>> >> >> >>> >>> >>
>>>> >> >> >>> >>> >> >
>>>> >> >> >>> >>> >> > Thanks
>>>> >> >> >>> >>> >> >
>>>> >> >> >>> >>> >> > Hope this helps to bootstrap this discussion
>>>> >> >> >>> >>> >> >> best
>>>> >> >> >>> >>> >> >> Rupert
>>>> >> >> >>> >>> >> >>
>>>> >> >> >>> >>> >> >> --
>>>> >> >> >>> >>> >> >> | Rupert Westenthaler
>>>> >> >> rupert.westenthaler@gmail.com
>>>> >> >> >>> >>> >> >> | Bodenlehenstraße 11
>>>> >> >> >>> ++43-699-11108907
>>>> >> >> >>> >>> >> >> | A-5500 Bischofshofen
>>>> >> >> >>> >>> >> >>
>>>> >> >> >>> >>> >>
>>>> >> >> >>> >>> >>
>>>> >> >> >>> >>> >>
>>>> >> >> >>> >>> >> --
>>>> >> >> >>> >>> >> | Rupert Westenthaler
>>>> >> rupert.westenthaler@gmail.com
>>>> >> >> >>> >>> >> | Bodenlehenstraße 11
>>>> >> >> >>> ++43-699-11108907
>>>> >> >> >>> >>> >> | A-5500 Bischofshofen
>>>> >> >> >>> >>> >>
>>>> >> >> >>> >>>
>>>> >> >> >>> >>>
>>>> >> >> >>> >>>
>>>> >> >> >>> >>> --
>>>> >> >> >>> >>> | Rupert Westenthaler
>>>> rupert.westenthaler@gmail.com
>>>> >> >> >>> >>> | Bodenlehenstraße 11
>>>> >> >> ++43-699-11108907
>>>> >> >> >>> >>> | A-5500 Bischofshofen
>>>> >> >> >>> >>>
>>>> >> >> >>> >>
>>>> >> >> >>> >>
>>>> >> >> >>>
>>>> >> >> >>>
>>>> >> >> >>>
>>>> >> >> >>> --
>>>> >> >> >>> | Rupert Westenthaler
>>>> rupert.westenthaler@gmail.com
>>>> >> >> >>> | Bodenlehenstraße 11
>>>> ++43-699-11108907
>>>> >> >> >>> | A-5500 Bischofshofen
>>>> >> >> >>>
>>>> >> >> >>
>>>> >> >> >>
>>>> >> >>
>>>> >> >>
>>>> >> >>
>>>> >> >> --
>>>> >> >> | Rupert Westenthaler             rupert.westenthaler@gmail.com
>>>> >> >> | Bodenlehenstraße 11
>>>> ++43-699-11108907
>>>> >> >> | A-5500 Bischofshofen
>>>> >> >>
>>>> >>
>>>> >>
>>>> >>
>>>> >> --
>>>> >> | Rupert Westenthaler             rupert.westenthaler@gmail.com
>>>> >> | Bodenlehenstraße 11                             ++43-699-11108907
>>>> >> | A-5500 Bischofshofen
>>>> >>
>>>>
>>>>
>>>>
>>>> --
>>>> | Rupert Westenthaler             rupert.westenthaler@gmail.com
>>>> | Bodenlehenstraße 11                             ++43-699-11108907
>>>> | A-5500 Bischofshofen
>>>>
>>>
>>>
>>



-- 
| Rupert Westenthaler             rupert.westenthaler@gmail.com
| Bodenlehenstraße 11                             ++43-699-11108907
| A-5500 Bischofshofen

Mime
View raw message