stanbol-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Cristian Petroaca <cristian.petro...@gmail.com>
Subject Re: Relation extraction feature
Date Sun, 01 Sep 2013 17:56:13 GMT
Related to the Stanford Dependency Tree Feature, this is the way the output
from the tool looks like for this sentence : "Mary and Tom met Danny today"
:


2013/8/30 Cristian Petroaca <cristian.petroaca@gmail.com>

> Hi Rupert,
>
> Ok, so after looking at the JSON output from the Stanford NLP Server and
> the coref module I'm thinking I can represent the coreference information
> this way:
> Each "Token" or "Chunk" will contain an additional coref annotation with
> the following structure :
>
> "stanbol.enhancer.nlp.coref" {
>     "tag" : //does this need to exist?
>     "isRepresentative" : true/false, // whether this token or chunk is the
> representative mention in the chain
>     "mentions" : [ { "sentenceNo" : 1 //the sentence in which the mention
> is found
>                            "startWord" : 2 //the first word making up the
> mention
>                            "endWord" : 3 //the last word making up the
> mention
>                          }, ...
>                        ],
>     "class" : ""class" : "org.apache.stanbol.enhancer.nlp.coref.CorefTag"
> }
>
> The CorefTag should resemble this model.
>
> What do you think?
>
> Cristian
>
>
> 2013/8/24 Rupert Westenthaler <rupert.westenthaler@gmail.com>
>
>> Hi Cristian,
>>
>> you can not directly call StanfordNLP components from Stanbol, but you
>> have to extend the RESTful service to include the information you
>> need. The main reason for that is that the license of StanfordNLP is
>> not compatible with the Apache Software License. So Stanbol can not
>> directly link to the StanfordNLP API.
>>
>> You will need to
>>
>> 1. define an additional class {yourTag} extends Tag<{yourType}> class
>> in the o.a.s.enhancer.nlp module
>> 2. add JSON parsing and serialization support for this tag to the
>> o.a.s.enhancer.nlp.json module (see e.g. PosTagSupport as an example)
>>
>> As (1) would be necessary anyway the only additional thing you need to
>> develop is (2). After that you can add {yourTag} instance to the
>> AnalyzedText in the StanfornNLP integration. The
>> RestfulNlpAnalysisEngine will parse them from the response. All
>> engines executed after the RestfulNlpAnalysisEngine will have access
>> to your annotations.
>>
>> If you have a design for {yourTag} - the model you would like to use
>> to represent your data - I can help with (1) and (2).
>>
>> best
>> Rupert
>>
>>
>> On Fri, Aug 23, 2013 at 5:11 PM, Cristian Petroaca
>> <cristian.petroaca@gmail.com> wrote:
>> > Hi Rupert,
>> >
>> > Thanks for the info. Looking at the standbol-stanfordnlp project I see
>> that
>> > the stanford nlp is not implemented as an EnhancementEngine but rather
>> it
>> > is used directly in a Jetty Server instance. How does that fit into the
>> > Stanbol stack? For example how can I call the StanfordNlpAnalyzer's
>> routine
>> > from my TripleExtractionEnhancementEngine which lives in the Stanbol
>> stack?
>> >
>> > Thanks,
>> > Cristian
>> >
>> >
>> > 2013/8/12 Rupert Westenthaler <rupert.westenthaler@gmail.com>
>> >
>> >> Hi Cristian,
>> >>
>> >> Sorry for the late response, but I was offline for the last two weeks
>> >>
>> >> On Fri, Aug 2, 2013 at 9:19 PM, Cristian Petroaca
>> >> <cristian.petroaca@gmail.com> wrote:
>> >> > Hi Rupert,
>> >> >
>> >> > After doing some tests it seems that the Stanford NLP coreference
>> module
>> >> is
>> >> > much more accurate than the Open NLP one.So I decided to extend
>> Stanford
>> >> > NLP to add coreference there.
>> >>
>> >> The Stanford NLP integration is not part of the Stanbol codebase
>> >> because the licenses are not compatible.
>> >>
>> >> You can find the Stanford NLP integration on
>> >>
>> >>     https://github.com/westei/stanbol-stanfordnlp
>> >>
>> >> just create a fork and send pull requests.
>> >>
>> >>
>> >> > Could you add the necessary projects on the branch? And also remove
>> the
>> >> > Open NLP ones?
>> >> >
>> >>
>> >> Currently the branch
>> >>
>> >>
>> >>
>> http://svn.apache.org/repos/asf/stanbol/branches/nlp-dep-tree-and-co-ref/
>> >>
>> >> only contains the "nlp" and the "nlp-json" modules. IMO those should
>> >> be enough for adding coreference support.
>> >>
>> >> IMO you will need to
>> >>
>> >> * add an model for representing coreference to the nlp module
>> >> * add parsing and serializing support to the nlp-json module
>> >> * add the implementation to your fork of the stanbol-stanfordnlp
>> project
>> >>
>> >> best
>> >> Rupert
>> >>
>> >>
>> >>
>> >> > Thanks,
>> >> > Cristian
>> >> >
>> >> >
>> >> > 2013/7/5 Rupert Westenthaler <rupert.westenthaler@gmail.com>
>> >> >
>> >> >> Hi Cristian,
>> >> >>
>> >> >> I created the branch at
>> >> >>
>> >> >>
>> >> >>
>> >>
>> http://svn.apache.org/repos/asf/stanbol/branches/nlp-dep-tree-and-co-ref/
>> >> >>
>> >> >> ATM in contains only the "nlp" and "nlp-json" module. Let me know if
>> >> >> you would like to have more
>> >> >>
>> >> >> best
>> >> >> Rupert
>> >> >>
>> >> >>
>> >> >>
>> >> >> On Thu, Jul 4, 2013 at 10:14 AM, Cristian Petroaca
>> >> >> <cristian.petroaca@gmail.com> wrote:
>> >> >> > Hi Rupert,
>> >> >> >
>> >> >> > I created jiras :
>> https://issues.apache.org/jira/browse/STANBOL-1132and
>> >> >> > https://issues.apache.org/jira/browse/STANBOL-1133. The original
>> one
>> >> in
>> >> >> > dependent upon these.
>> >> >> > Please let me know when I can start using the branch.
>> >> >> >
>> >> >> > Thanks,
>> >> >> > Cristian
>> >> >> >
>> >> >> >
>> >> >> > 2013/6/27 Cristian Petroaca <cristian.petroaca@gmail.com>
>> >> >> >
>> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >> 2013/6/27 Rupert Westenthaler <rupert.westenthaler@gmail.com>
>> >> >> >>
>> >> >> >>> On Thu, Jun 27, 2013 at 3:12 PM, Cristian Petroaca
>> >> >> >>> <cristian.petroaca@gmail.com> wrote:
>> >> >> >>> > Sorry, I meant the Stanbol NLP API, not Stanford in my
>> previous
>> >> >> e-mail.
>> >> >> >>> By
>> >> >> >>> > the way, does Open NLP have the ability to build dependency
>> trees?
>> >> >> >>> >
>> >> >> >>>
>> >> >> >>> AFAIK OpenNLP does not provide this feature.
>> >> >> >>>
>> >> >> >>
>> >> >> >> Then , since the Stanford NLP lib is also integrated into
>> Stanbol,
>> >> I'll
>> >> >> >> take a look at how I can extend its integration to include the
>> >> >> dependency
>> >> >> >> tree feature.
>> >> >> >>
>> >> >> >>>
>> >> >> >>>
>> >> >> >>  >
>> >> >> >>> > 2013/6/23 Cristian Petroaca <cristian.petroaca@gmail.com>
>> >> >> >>> >
>> >> >> >>> >> Hi Rupert,
>> >> >> >>> >>
>> >> >> >>> >> I created jira
>> >> https://issues.apache.org/jira/browse/STANBOL-1121.
>> >> >> >>> >> As you suggested I would start with extending the Stanford
>> NLP
>> >> with
>> >> >> >>> >> co-reference resolution but I think also with dependency
>> trees
>> >> >> because
>> >> >> >>> I
>> >> >> >>> >> also need to know the Subject of the sentence and the object
>> >> that it
>> >> >> >>> >> affects, right?
>> >> >> >>> >>
>> >> >> >>> >> Given that I need to extend the Stanford NLP API in Stanbol
>> for
>> >> >> >>> >> co-reference and dependency trees, how do I proceed with
>> this?
>> >> Do I
>> >> >> >>> create
>> >> >> >>> >> 2 new sub-tasks to the already opened Jira? After that can I
>> >> start
>> >> >> >>> >> implementing on my local copy of Stanbol and when I'm done
>> I'll
>> >> send
>> >> >> >>> you
>> >> >> >>> >> guys the patch fo review?
>> >> >> >>> >>
>> >> >> >>>
>> >> >> >>> I would create two "New Feature" type Issues one for adding
>> support
>> >> >> >>> for "dependency trees" and the other for "co-reference"
>> support. You
>> >> >> >>> should also define "depends on" relations between STANBOL-1121
>> and
>> >> >> >>> those two new issues.
>> >> >> >>>
>> >> >> >>> Sub-task could also work, but as adding those features would be
>> also
>> >> >> >>> interesting for other things I would rather define them as
>> separate
>> >> >> >>> issues.
>> >> >> >>>
>> >> >> >>>
>> >> >> >> 2 New Features connected with the original jira it is then.
>> >> >> >>
>> >> >> >>
>> >> >> >>> If you would prefer to work in an own branch please tell me.
>> This
>> >> >> >>> could have the advantage that patches would not be affected by
>> >> changes
>> >> >> >>> in the trunk.
>> >> >> >>>
>> >> >> >>> Yes, a separate branch sounds good.
>> >> >> >>
>> >> >> >> best
>> >> >> >>> Rupert
>> >> >> >>>
>> >> >> >>> >> Regards,
>> >> >> >>> >> Cristian
>> >> >> >>> >>
>> >> >> >>> >>
>> >> >> >>> >> 2013/6/18 Rupert Westenthaler <rupert.westenthaler@gmail.com
>> >
>> >> >> >>> >>
>> >> >> >>> >>> On Mon, Jun 17, 2013 at 10:18 PM, Cristian Petroaca
>> >> >> >>> >>> <cristian.petroaca@gmail.com> wrote:
>> >> >> >>> >>> > Hi Rupert,
>> >> >> >>> >>> >
>> >> >> >>> >>> > Agreed on the
>> >> >> >>> SettingAnnotation/ParticipantAnnotation/OccurentAnnotation
>> >> >> >>> >>> > data structure.
>> >> >> >>> >>> >
>> >> >> >>> >>> > Should I open up a Jira for all of this in order to
>> >> encapsulate
>> >> >> this
>> >> >> >>> >>> > information and establish the goals and these initial
>> steps
>> >> >> towards
>> >> >> >>> >>> these
>> >> >> >>> >>> > goals?
>> >> >> >>> >>>
>> >> >> >>> >>> Yes please. A JIRA issue for this work would be great.
>> >> >> >>> >>>
>> >> >> >>> >>> > How should I proceed further? Should I create some design
>> >> >> documents
>> >> >> >>> that
>> >> >> >>> >>> > need to be reviewed?
>> >> >> >>> >>>
>> >> >> >>> >>> Usually it is the best to write design related text
>> directly in
>> >> >> JIRA
>> >> >> >>> >>> by using Markdown [1] syntax. This will allow us later to
>> use
>> >> this
>> >> >> >>> >>> text directly for the documentation on the Stanbol Webpage.
>> >> >> >>> >>>
>> >> >> >>> >>> best
>> >> >> >>> >>> Rupert
>> >> >> >>> >>>
>> >> >> >>> >>>
>> >> >> >>> >>> [1] http://daringfireball.net/projects/markdown/
>> >> >> >>> >>> >
>> >> >> >>> >>> > Regards,
>> >> >> >>> >>> > Cristian
>> >> >> >>> >>> >
>> >> >> >>> >>> >
>> >> >> >>> >>> > 2013/6/17 Rupert Westenthaler <
>> rupert.westenthaler@gmail.com>
>> >> >> >>> >>> >
>> >> >> >>> >>> >> On Thu, Jun 13, 2013 at 8:22 PM, Cristian Petroaca
>> >> >> >>> >>> >> <cristian.petroaca@gmail.com> wrote:
>> >> >> >>> >>> >> > HI Rupert,
>> >> >> >>> >>> >> >
>> >> >> >>> >>> >> > First of all thanks for the detailed suggestions.
>> >> >> >>> >>> >> >
>> >> >> >>> >>> >> > 2013/6/12 Rupert Westenthaler <
>> >> rupert.westenthaler@gmail.com>
>> >> >> >>> >>> >> >
>> >> >> >>> >>> >> >> Hi Cristian, all
>> >> >> >>> >>> >> >>
>> >> >> >>> >>> >> >> really interesting use case!
>> >> >> >>> >>> >> >>
>> >> >> >>> >>> >> >> In this mail I will try to give some suggestions on
>> how
>> >> this
>> >> >> >>> could
>> >> >> >>> >>> >> >> work out. This suggestions are mainly based on
>> experiences
>> >> >> and
>> >> >> >>> >>> lessons
>> >> >> >>> >>> >> >> learned in the LIVE [2] project where we built an
>> >> information
>> >> >> >>> system
>> >> >> >>> >>> >> >> for the Olympic Games in Peking. While this Project
>> >> excluded
>> >> >> the
>> >> >> >>> >>> >> >> extraction of Events from unstructured text (because
>> the
>> >> >> Olympic
>> >> >> >>> >>> >> >> Information System was already providing event data
>> as XML
>> >> >> >>> messages)
>> >> >> >>> >>> >> >> the semantic search capabilities of this system where
>> very
>> >> >> >>> similar
>> >> >> >>> >>> as
>> >> >> >>> >>> >> >> the one described by your use case.
>> >> >> >>> >>> >> >>
>> >> >> >>> >>> >> >> IMHO you are not only trying to extract relations,
>> but a
>> >> >> formal
>> >> >> >>> >>> >> >> representation of the situation described by the
>> text. So
>> >> >> lets
>> >> >> >>> >>> assume
>> >> >> >>> >>> >> >> that the goal is to Annotate a Setting (or Situation)
>> >> >> described
>> >> >> >>> in
>> >> >> >>> >>> the
>> >> >> >>> >>> >> >> text - a fise:SettingAnnotation.
>> >> >> >>> >>> >> >>
>> >> >> >>> >>> >> >> The DOLCE foundational ontology [1] gives some
>> advices on
>> >> >> how to
>> >> >> >>> >>> model
>> >> >> >>> >>> >> >> those. The important relation for modeling this
>> >> >> Participation:
>> >> >> >>> >>> >> >>
>> >> >> >>> >>> >> >>     PC(x, y, t) → (ED(x) ∧ PD(y) ∧ T(t))
>> >> >> >>> >>> >> >>
>> >> >> >>> >>> >> >> where ..
>> >> >> >>> >>> >> >>
>> >> >> >>> >>> >> >>  * ED are Endurants (continuants): Endurants do have
>> an
>> >> >> >>> identity so
>> >> >> >>> >>> we
>> >> >> >>> >>> >> >> would typically refer to them as Entities referenced
>> by a
>> >> >> >>> setting.
>> >> >> >>> >>> >> >> Note that this includes physical, non-physical as
>> well as
>> >> >> >>> >>> >> >> social-objects.
>> >> >> >>> >>> >> >>  * PD are Perdurants (occurrents):  Perdurants are
>> >> entities
>> >> >> that
>> >> >> >>> >>> >> >> happen in time. This refers to Events, Activities ...
>> >> >> >>> >>> >> >>  * PC are Participation: It is an time indexed
>> relation
>> >> where
>> >> >> >>> >>> >> >> Endurants participate in Perdurants
>> >> >> >>> >>> >> >>
>> >> >> >>> >>> >> >> Modeling this in RDF requires to define some
>> intermediate
>> >> >> >>> resources
>> >> >> >>> >>> >> >> because RDF does not allow for n-ary relations.
>> >> >> >>> >>> >> >>
>> >> >> >>> >>> >> >>  * fise:SettingAnnotation: It is really handy to
>> define
>> >> one
>> >> >> >>> resource
>> >> >> >>> >>> >> >> being the context for all described data. I would call
>> >> this
>> >> >> >>> >>> >> >> "fise:SettingAnnotation" and define it as a
>> sub-concept to
>> >> >> >>> >>> >> >> fise:Enhancement. All further enhancement about the
>> >> extracted
>> >> >> >>> >>> Setting
>> >> >> >>> >>> >> >> would define a "fise:in-setting" relation to it.
>> >> >> >>> >>> >> >>
>> >> >> >>> >>> >> >>  * fise:ParticipantAnnotation: Is used to annotate
>> that
>> >> >> >>> Endurant is
>> >> >> >>> >>> >> >> participating on a setting (fise:in-setting
>> >> >> >>> fise:SettingAnnotation).
>> >> >> >>> >>> >> >> The Endurant itself is described by existing
>> >> >> fise:TextAnnotaion
>> >> >> >>> (the
>> >> >> >>> >>> >> >> mentions) and fise:EntityAnnotation (suggested
>> Entities).
>> >> >> >>> Basically
>> >> >> >>> >>> >> >> the fise:ParticipantAnnotation will allow an
>> >> >> EnhancementEngine
>> >> >> >>> to
>> >> >> >>> >>> >> >> state that several mentions (in possible different
>> >> >> sentences) do
>> >> >> >>> >>> >> >> represent the same Endurant as participating in the
>> >> Setting.
>> >> >> In
>> >> >> >>> >>> >> >> addition it would be possible to use the dc:type
>> property
>> >> >> >>> (similar
>> >> >> >>> >>> as
>> >> >> >>> >>> >> >> for fise:TextAnnotation) to refer to the role(s) of an
>> >> >> >>> participant
>> >> >> >>> >>> >> >> (e.g. the set: Agent (intensionally performs an
>> action)
>> >> Cause
>> >> >> >>> >>> >> >> (unintentionally e.g. a mud slide), Patient (a passive
>> >> role
>> >> >> in
>> >> >> >>> an
>> >> >> >>> >>> >> >> activity) and Instrument (aids an process)), but I am
>> >> >> wondering
>> >> >> >>> if
>> >> >> >>> >>> one
>> >> >> >>> >>> >> >> could extract those information.
>> >> >> >>> >>> >> >>
>> >> >> >>> >>> >> >> * fise:OccurrentAnnotation: is used to annotate a
>> >> Perdurant
>> >> >> in
>> >> >> >>> the
>> >> >> >>> >>> >> >> context of the Setting. Also fise:OccurrentAnnotation
>> can
>> >> >> link
>> >> >> >>> to
>> >> >> >>> >>> >> >> fise:TextAnnotaion (typically verbs in the text
>> defining
>> >> the
>> >> >> >>> >>> >> >> perdurant) as well as fise:EntityAnnotation suggesting
>> >> well
>> >> >> >>> known
>> >> >> >>> >>> >> >> Events in a knowledge base (e.g. a Election in a
>> country,
>> >> or
>> >> >> an
>> >> >> >>> >>> >> >> upraising ...). In addition fise:OccurrentAnnotation
>> can
>> >> >> define
>> >> >> >>> >>> >> >> dc:has-participant links to
>> fise:ParticipantAnnotation. In
>> >> >> this
>> >> >> >>> case
>> >> >> >>> >>> >> >> it is explicitly stated hat an Endurant (the
>> >> >> >>> >>> >> >> fise:ParticipantAnnotation) involved in this Perturant
>> >> (the
>> >> >> >>> >>> >> >> fise:OccurrentAnnotation). As Occurrences are temporal
>> >> >> indexed
>> >> >> >>> this
>> >> >> >>> >>> >> >> annotation should also support properties for
>> defining the
>> >> >> >>> >>> >> >> xsd:dateTime for the start/end.
>> >> >> >>> >>> >> >>
>> >> >> >>> >>> >> >>
>> >> >> >>> >>> >> >> Indeed, an event based data structure makes a lot of
>> sense
>> >> >> with
>> >> >> >>> the
>> >> >> >>> >>> >> remark
>> >> >> >>> >>> >> > that you probably won't be able to always extract the
>> date
>> >> >> for a
>> >> >> >>> >>> given
>> >> >> >>> >>> >> > setting(situation).
>> >> >> >>> >>> >> > There are 2 thing which are unclear though.
>> >> >> >>> >>> >> >
>> >> >> >>> >>> >> > 1. Perdurant : You could have situations in which the
>> >> object
>> >> >> upon
>> >> >> >>> >>> which
>> >> >> >>> >>> >> the
>> >> >> >>> >>> >> > Subject ( or Endurant ) is acting is not a transitory
>> >> object (
>> >> >> >>> such
>> >> >> >>> >>> as an
>> >> >> >>> >>> >> > event, activity ) but rather another Endurant. For
>> example
>> >> we
>> >> >> can
>> >> >> >>> >>> have
>> >> >> >>> >>> >> the
>> >> >> >>> >>> >> > phrase "USA invades Irak" where "USA" is the Endurant (
>> >> >> Subject )
>> >> >> >>> >>> which
>> >> >> >>> >>> >> > performs the action of "invading" on another Eundurant,
>> >> namely
>> >> >> >>> >>> "Irak".
>> >> >> >>> >>> >> >
>> >> >> >>> >>> >>
>> >> >> >>> >>> >> By using CAOS, USA would be the Agent and Iraq the
>> Patient.
>> >> Both
>> >> >> >>> are
>> >> >> >>> >>> >> Endurants. The activity "invading" would be the
>> Perdurant. So
>> >> >> >>> ideally
>> >> >> >>> >>> >> you would have a  "fise:SettingAnnotation" with:
>> >> >> >>> >>> >>
>> >> >> >>> >>> >>   * fise:ParticipantAnnotation for USA with the dc:type
>> >> >> caos:Agent,
>> >> >> >>> >>> >> linking to a fise:TextAnnotation for "USA" and a
>> >> >> >>> fise:EntityAnnotation
>> >> >> >>> >>> >> linking to dbpedia:United_States
>> >> >> >>> >>> >>   * fise:ParticipantAnnotation for Iraq with the dc:type
>> >> >> >>> caos:Patient,
>> >> >> >>> >>> >> linking to a fise:TextAnnotation for "Irak" and a
>> >> >> >>> >>> >> fise:EntityAnnotation linking to  dbpedia:Iraq
>> >> >> >>> >>> >>   * fise:OccurrentAnnotation for "invades" with the
>> dc:type
>> >> >> >>> >>> >> caos:Activity, linking to a fise:TextAnnotation for
>> "invades"
>> >> >> >>> >>> >>
>> >> >> >>> >>> >> > 2. Where does the verb, which links the Subject and the
>> >> Object
>> >> >> >>> come
>> >> >> >>> >>> into
>> >> >> >>> >>> >> > this? I imagined that the Endurant would have a
>> >> dc:"property"
>> >> >> >>> where
>> >> >> >>> >>> the
>> >> >> >>> >>> >> > property = verb which links to the Object in noun
>> form. For
>> >> >> >>> example
>> >> >> >>> >>> take
>> >> >> >>> >>> >> > again the sentence "USA invades Irak". You would have
>> the
>> >> >> "USA"
>> >> >> >>> >>> Entity
>> >> >> >>> >>> >> with
>> >> >> >>> >>> >> > dc:invader which points to the Object "Irak". The
>> Endurant
>> >> >> would
>> >> >> >>> >>> have as
>> >> >> >>> >>> >> > many dc:"property" elements as there are verbs which
>> link
>> >> it
>> >> >> to
>> >> >> >>> an
>> >> >> >>> >>> >> Object.
>> >> >> >>> >>> >>
>> >> >> >>> >>> >> As explained above you would have a
>> fise:OccurrentAnnotation
>> >> >> that
>> >> >> >>> >>> >> represents the Perdurant. The information that the
>> activity
>> >> >> >>> mention in
>> >> >> >>> >>> >> the text is "invades" would be by linking to a
>> >> >> >>> fise:TextAnnotation. If
>> >> >> >>> >>> >> you can also provide an Ontology for Tasks that defines
>> >> >> >>> >>> >> "myTasks:invade" the fise:OccurrentAnnotation could also
>> link
>> >> >> to an
>> >> >> >>> >>> >> fise:EntityAnnotation for this concept.
>> >> >> >>> >>> >>
>> >> >> >>> >>> >> best
>> >> >> >>> >>> >> Rupert
>> >> >> >>> >>> >>
>> >> >> >>> >>> >> >
>> >> >> >>> >>> >> > ### Consuming the data:
>> >> >> >>> >>> >> >>
>> >> >> >>> >>> >> >> I think this model should be sufficient for use-cases
>> as
>> >> >> >>> described
>> >> >> >>> >>> by
>> >> >> >>> >>> >> you.
>> >> >> >>> >>> >> >>
>> >> >> >>> >>> >> >> Users would be able to consume data on the setting
>> level.
>> >> >> This
>> >> >> >>> can
>> >> >> >>> >>> be
>> >> >> >>> >>> >> >> done my simple retrieving all
>> fise:ParticipantAnnotation
>> >> as
>> >> >> >>> well as
>> >> >> >>> >>> >> >> fise:OccurrentAnnotation linked with a setting. BTW
>> this
>> >> was
>> >> >> the
>> >> >> >>> >>> >> >> approach used in LIVE [2] for semantic search. It
>> allows
>> >> >> >>> queries for
>> >> >> >>> >>> >> >> Settings that involve specific Entities e.g. you could
>> >> filter
>> >> >> >>> for
>> >> >> >>> >>> >> >> Settings that involve a {Person}, activities:Arrested
>> and
>> >> a
>> >> >> >>> specific
>> >> >> >>> >>> >> >> {Upraising}. However note that with this approach you
>> will
>> >> >> get
>> >> >> >>> >>> results
>> >> >> >>> >>> >> >> for Setting where the {Person} participated and an
>> other
>> >> >> person
>> >> >> >>> was
>> >> >> >>> >>> >> >> arrested.
>> >> >> >>> >>> >> >>
>> >> >> >>> >>> >> >> An other possibility would be to process enhancement
>> >> results
>> >> >> on
>> >> >> >>> the
>> >> >> >>> >>> >> >> fise:OccurrentAnnotation. This would allow to a much
>> >> higher
>> >> >> >>> >>> >> >> granularity level (e.g. it would allow to correctly
>> answer
>> >> >> the
>> >> >> >>> query
>> >> >> >>> >>> >> >> used as an example above). But I am wondering if the
>> >> quality
>> >> >> of
>> >> >> >>> the
>> >> >> >>> >>> >> >> Setting extraction will be sufficient for this. I have
>> >> also
>> >> >> >>> doubts
>> >> >> >>> >>> if
>> >> >> >>> >>> >> >> this can be still realized by using semantic indexing
>> to
>> >> >> Apache
>> >> >> >>> Solr
>> >> >> >>> >>> >> >> or if it would be better/necessary to store results
>> in a
>> >> >> >>> TripleStore
>> >> >> >>> >>> >> >> and using SPARQL for retrieval.
>> >> >> >>> >>> >> >>
>> >> >> >>> >>> >> >> The methodology and query language used by YAGO [3] is
>> >> also
>> >> >> very
>> >> >> >>> >>> >> >> relevant for this (especially note chapter 7 SPOTL(X)
>> >> >> >>> >>> Representation).
>> >> >> >>> >>> >> >>
>> >> >> >>> >>> >> >> An other related Topic is the enrichment of Entities
>> >> >> (especially
>> >> >> >>> >>> >> >> Events) in knowledge bases based on Settings extracted
>> >> form
>> >> >> >>> >>> Documents.
>> >> >> >>> >>> >> >> As per definition - in DOLCE - Perdurants are temporal
>> >> >> indexed.
>> >> >> >>> That
>> >> >> >>> >>> >> >> means that at the time when added to a knowledge base
>> they
>> >> >> might
>> >> >> >>> >>> still
>> >> >> >>> >>> >> >> be in process. So the creation, enriching and
>> refinement
>> >> of
>> >> >> such
>> >> >> >>> >>> >> >> Entities in a the knowledge base seams to be critical
>> for
>> >> a
>> >> >> >>> System
>> >> >> >>> >>> >> >> like described in your use-case.
>> >> >> >>> >>> >> >>
>> >> >> >>> >>> >> >> On Tue, Jun 11, 2013 at 9:09 PM, Cristian Petroaca
>> >> >> >>> >>> >> >> <cristian.petroaca@gmail.com> wrote:
>> >> >> >>> >>> >> >> >
>> >> >> >>> >>> >> >> > First of all I have to mention that I am new in the
>> >> field
>> >> >> of
>> >> >> >>> >>> semantic
>> >> >> >>> >>> >> >> > technologies, I've started to read about them in the
>> >> last
>> >> >> 4-5
>> >> >> >>> >>> >> >> months.Having
>> >> >> >>> >>> >> >> > said that I have a high level overview of what is a
>> good
>> >> >> >>> approach
>> >> >> >>> >>> to
>> >> >> >>> >>> >> >> solve
>> >> >> >>> >>> >> >> > this problem. There are a number of papers on the
>> >> internet
>> >> >> >>> which
>> >> >> >>> >>> >> describe
>> >> >> >>> >>> >> >> > what steps need to be taken such as : named entity
>> >> >> >>> recognition,
>> >> >> >>> >>> >> >> > co-reference resolution, pos tagging and others.
>> >> >> >>> >>> >> >>
>> >> >> >>> >>> >> >> The Stanbol NLP processing module currently only
>> supports
>> >> >> >>> sentence
>> >> >> >>> >>> >> >> detection, tokenization, POS tagging, Chunking, NER
>> and
>> >> >> lemma.
>> >> >> >>> >>> support
>> >> >> >>> >>> >> >> for co-reference resolution and dependency trees is
>> >> currently
>> >> >> >>> >>> missing.
>> >> >> >>> >>> >> >>
>> >> >> >>> >>> >> >> Stanford NLP is already integrated with Stanbol [4].
>> At
>> >> the
>> >> >> >>> moment
>> >> >> >>> >>> it
>> >> >> >>> >>> >> >> only supports English, but I do already work to
>> include
>> >> the
>> >> >> >>> other
>> >> >> >>> >>> >> >> supported languages. Other NLP framework that is
>> already
>> >> >> >>> integrated
>> >> >> >>> >>> >> >> with Stanbol are Freeling [5] and Talismane [6]. But
>> note
>> >> >> that
>> >> >> >>> for
>> >> >> >>> >>> all
>> >> >> >>> >>> >> >> those the integration excludes support for
>> co-reference
>> >> and
>> >> >> >>> >>> dependency
>> >> >> >>> >>> >> >> trees.
>> >> >> >>> >>> >> >>
>> >> >> >>> >>> >> >> Anyways I am confident that one can implement a first
>> >> >> prototype
>> >> >> >>> by
>> >> >> >>> >>> >> >> only using Sentences and POS tags and - if available -
>> >> Chunks
>> >> >> >>> (e.g.
>> >> >> >>> >>> >> >> Noun phrases).
>> >> >> >>> >>> >> >>
>> >> >> >>> >>> >> >>
>> >> >> >>> >>> >> > I assume that in the Stanbol context, a feature like
>> >> Relation
>> >> >> >>> >>> extraction
>> >> >> >>> >>> >> > would be implemented as an EnhancementEngine?
>> >> >> >>> >>> >> > What kind of effort would be required for a
>> co-reference
>> >> >> >>> resolution
>> >> >> >>> >>> tool
>> >> >> >>> >>> >> > integration into Stanbol?
>> >> >> >>> >>> >> >
>> >> >> >>> >>> >>
>> >> >> >>> >>> >> Yes in the end it would be an EnhancementEngine. But
>> before
>> >> we
>> >> >> can
>> >> >> >>> >>> >> build such an engine we would need to
>> >> >> >>> >>> >>
>> >> >> >>> >>> >> * extend the Stanbol NLP processing API with Annotations
>> for
>> >> >> >>> >>> co-reference
>> >> >> >>> >>> >> * add support for JSON Serialisation/Parsing for those
>> >> >> annotation
>> >> >> >>> so
>> >> >> >>> >>> >> that the RESTful NLP Analysis Service can provide
>> >> co-reference
>> >> >> >>> >>> >> information
>> >> >> >>> >>> >>
>> >> >> >>> >>> >> > At this moment I'll be focusing on 2 aspects:
>> >> >> >>> >>> >> >
>> >> >> >>> >>> >> > 1. Determine the best data structure to encapsulate the
>> >> >> extracted
>> >> >> >>> >>> >> > information. I'll take a closer look at Dolce.
>> >> >> >>> >>> >>
>> >> >> >>> >>> >> Don't make to to complex. Defining a proper structure to
>> >> >> represent
>> >> >> >>> >>> >> Events will only pay-off if we can also successfully
>> extract
>> >> >> such
>> >> >> >>> >>> >> information form processed texts.
>> >> >> >>> >>> >>
>> >> >> >>> >>> >> I would start with
>> >> >> >>> >>> >>
>> >> >> >>> >>> >>  * fise:SettingAnnotation
>> >> >> >>> >>> >>     * {fise:Enhancement} metadata
>> >> >> >>> >>> >>
>> >> >> >>> >>> >>  * fise:ParticipantAnnotation
>> >> >> >>> >>> >>     * {fise:Enhancement} metadata
>> >> >> >>> >>> >>     * fise:inSetting {settingAnnotation}
>> >> >> >>> >>> >>     * fise:hasMention {textAnnotation}
>> >> >> >>> >>> >>     * fise:suggestion {entityAnnotation} (multiple if
>> there
>> >> are
>> >> >> >>> more
>> >> >> >>> >>> >> suggestions)
>> >> >> >>> >>> >>     * dc:type one of fise:Agent, fise:Patient,
>> >> fise:Instrument,
>> >> >> >>> >>> fise:Cause
>> >> >> >>> >>> >>
>> >> >> >>> >>> >>  * fise:OccurrentAnnotation
>> >> >> >>> >>> >>     * {fise:Enhancement} metadata
>> >> >> >>> >>> >>     * fise:inSetting {settingAnnotation}
>> >> >> >>> >>> >>     * fise:hasMention {textAnnotation}
>> >> >> >>> >>> >>     * dc:type set to fise:Activity
>> >> >> >>> >>> >>
>> >> >> >>> >>> >> If it turns out that we can extract more, we can add more
>> >> >> >>> structure to
>> >> >> >>> >>> >> those annotations. We might also think about using an own
>> >> >> namespace
>> >> >> >>> >>> >> for those extensions to the annotation structure.
>> >> >> >>> >>> >>
>> >> >> >>> >>> >> > 2. Determine how should all of this be integrated into
>> >> >> Stanbol.
>> >> >> >>> >>> >>
>> >> >> >>> >>> >> Just create an EventExtractionEngine and configure a
>> >> enhancement
>> >> >> >>> chain
>> >> >> >>> >>> >> that does NLP processing and EntityLinking.
>> >> >> >>> >>> >>
>> >> >> >>> >>> >> You should have a look at
>> >> >> >>> >>> >>
>> >> >> >>> >>> >> * SentimentSummarizationEngine [1] as it does a lot of
>> things
>> >> >> with
>> >> >> >>> NLP
>> >> >> >>> >>> >> processing results (e.g. connecting adjectives (via
>> verbs) to
>> >> >> >>> >>> >> nouns/pronouns. So as long we can not use explicit
>> dependency
>> >> >> trees
>> >> >> >>> >>> >> you code will need to do similar things with Nouns,
>> Pronouns
>> >> and
>> >> >> >>> >>> >> Verbs.
>> >> >> >>> >>> >>
>> >> >> >>> >>> >> * Disambigutation-MLT engine, as it creates a Java
>> >> >> representation
>> >> >> >>> of
>> >> >> >>> >>> >> present fise:TextAnnotation and fise:EntityAnnotation
>> [2].
>> >> >> >>> Something
>> >> >> >>> >>> >> similar will also be required by the
>> EventExtractionEngine
>> >> for
>> >> >> fast
>> >> >> >>> >>> >> access to such annotations while iterating over the
>> >> Sentences of
>> >> >> >>> the
>> >> >> >>> >>> >> text.
>> >> >> >>> >>> >>
>> >> >> >>> >>> >>
>> >> >> >>> >>> >> best
>> >> >> >>> >>> >> Rupert
>> >> >> >>> >>> >>
>> >> >> >>> >>> >> [1]
>> >> >> >>> >>> >>
>> >> >> >>> >>>
>> >> >> >>>
>> >> >>
>> >>
>> https://svn.apache.org/repos/asf/stanbol/trunk/enhancement-engines/sentiment-summarization/src/main/java/org/apache/stanbol/enhancer/engines/sentiment/summarize/SentimentSummarizationEngine.java
>> >> >> >>> >>> >> [2]
>> >> >> >>> >>> >>
>> >> >> >>> >>>
>> >> >> >>>
>> >> >>
>> >>
>> https://svn.apache.org/repos/asf/stanbol/trunk/enhancement-engines/disambiguation-mlt/src/main/java/org/apache/stanbol/enhancer/engine/disambiguation/mlt/DisambiguationData.java
>> >> >> >>> >>> >>
>> >> >> >>> >>> >> >
>> >> >> >>> >>> >> > Thanks
>> >> >> >>> >>> >> >
>> >> >> >>> >>> >> > Hope this helps to bootstrap this discussion
>> >> >> >>> >>> >> >> best
>> >> >> >>> >>> >> >> Rupert
>> >> >> >>> >>> >> >>
>> >> >> >>> >>> >> >> --
>> >> >> >>> >>> >> >> | Rupert Westenthaler
>> >> >> rupert.westenthaler@gmail.com
>> >> >> >>> >>> >> >> | Bodenlehenstraße 11
>> >> >> >>> ++43-699-11108907
>> >> >> >>> >>> >> >> | A-5500 Bischofshofen
>> >> >> >>> >>> >> >>
>> >> >> >>> >>> >>
>> >> >> >>> >>> >>
>> >> >> >>> >>> >>
>> >> >> >>> >>> >> --
>> >> >> >>> >>> >> | Rupert Westenthaler
>> >> rupert.westenthaler@gmail.com
>> >> >> >>> >>> >> | Bodenlehenstraße 11
>> >> >> >>> ++43-699-11108907
>> >> >> >>> >>> >> | A-5500 Bischofshofen
>> >> >> >>> >>> >>
>> >> >> >>> >>>
>> >> >> >>> >>>
>> >> >> >>> >>>
>> >> >> >>> >>> --
>> >> >> >>> >>> | Rupert Westenthaler
>> rupert.westenthaler@gmail.com
>> >> >> >>> >>> | Bodenlehenstraße 11
>> >> >> ++43-699-11108907
>> >> >> >>> >>> | A-5500 Bischofshofen
>> >> >> >>> >>>
>> >> >> >>> >>
>> >> >> >>> >>
>> >> >> >>>
>> >> >> >>>
>> >> >> >>>
>> >> >> >>> --
>> >> >> >>> | Rupert Westenthaler             rupert.westenthaler@gmail.com
>> >> >> >>> | Bodenlehenstraße 11
>> ++43-699-11108907
>> >> >> >>> | A-5500 Bischofshofen
>> >> >> >>>
>> >> >> >>
>> >> >> >>
>> >> >>
>> >> >>
>> >> >>
>> >> >> --
>> >> >> | Rupert Westenthaler             rupert.westenthaler@gmail.com
>> >> >> | Bodenlehenstraße 11                             ++43-699-11108907
>> >> >> | A-5500 Bischofshofen
>> >> >>
>> >>
>> >>
>> >>
>> >> --
>> >> | Rupert Westenthaler             rupert.westenthaler@gmail.com
>> >> | Bodenlehenstraße 11                             ++43-699-11108907
>> >> | A-5500 Bischofshofen
>> >>
>>
>>
>>
>> --
>> | Rupert Westenthaler             rupert.westenthaler@gmail.com
>> | Bodenlehenstraße 11                             ++43-699-11108907
>> | A-5500 Bischofshofen
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message