stanbol-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Enrico Daga <enricod...@gmail.com>
Subject Re: Stanbol Enhancement Structure (discussion)
Date Tue, 01 Mar 2011 18:10:43 GMT
On 1 March 2011 18:39, Rupert Westenthaler <rwesten@apache.org> wrote:
> On Tue, Mar 1, 2011 at 5:23 PM, Enrico Daga <enricodaga@gmail.com> wrote:
>> Hi Rupert, all,
>> first thank you for setup the proposal.
>> I have few considerations I want to share, at a first read.
>>
>> Abut annotation roles, between Tag/Keyword there is IMHO, no
>> difference, we should keep one.
>>
> That is something that needs to be discussed.
> When you look at "keywords" extracted e.g. by SalsaDev so one can
> clearly recognize a difference to suggested tags by other engines. I
> refer to the fact that keywords - in case of SlasaDev - usually refer
> to "normal" words that are somehow central within a document while
> Tags are usually much more related to some kind of Entities.
Maybe, but then I would not say Tag but Entity, tag is something that
reminds me to some label that I apply to classify the item...
ok... this needs to be discussed :)

>
>> Alongside Annotation and Enhancement, I would also consider to add
>> another concept: Embedded Knowledge, which should be fed into a
>> separate graph, this graph would then host any triples directly or
>> indirectly embedded in the content.
>>
>
> That was already discussed in Istanbul and it is very likely that we
> will implemented exactly like that, but such knowledge is not related
> to the enhancement structure and therefore not part of this
> specification.
It doesn't seem, if the specification wants to model the result of
enhancement engines I think that all kind of engine results should be
taken into account. Actually we would implement some RDFa extractor
exactly in the same way as the LocationEnhancementEngine. This risks
to be confusing.

>
>> About annotations, I would remove a fixed list of entity type (Person,
>> Organization, Location), since this is very related to the single
>> engine and should be easily extend-able (or leave them but consider
>> the possibility that some engine could extract "Fruit" without the
>> need to change the ontology) .
>
> I do not agree with that. My counter arguments are
>  - all NLP tools support this kind of entities
>  - with only some types e.g. Person, Organizaiton, Location,
> Activities one can cover a lot of detected Entities
>  - for users that need to group detected entities it is much easier to
> deal with a fixed list as to write there own clustering algorithm for
> dealing with an extendable list. In my opinion the 4 above types +
> others should be ok for most of the use cases.
>  - If someone wants/needs to process the exact types of extracted
> features he can anyway use the sb:entity-type. I accept especially
> domain specific applications to have special support for Entities
> using an type that is part of there domain ontology.
>
> Based on this "Banana" would end up in "other" and the type "Fruit"
> would be available via the "sb:entity-type" property. An application
> of a super market might however have an own "Fruit" category and the
> Banana would show up there
> To summarize my goal with the dc:type property is not to be flexible
> nor semantically correct, but to make it easy for users to consume the
> enhancements. The flexibility and extensibility is provided by
> "sb:entity-type"
>
> Does that make sense to you?
In principle not, but I agree that a fixed list is easier to adopt.
My opinion is that the stanbol vocabulary should avoid domain-specific
terms, and leaving them to the engines implementers.
I also guess (but i cannot proof it ;) ) that if default provided
engines uses some terms, then next to come engines will likely reuse
those terms, to better support adoption. So i still do not see the
need of a fixed set of entity types.

>
>>
>> In a future version it would be nice to find a way to let engines
>> declare which is the contribution they are going to provide (tagging?
>> categorization? metadata? embedded knowledge?) and how (adding
>> annotation roles? entity types? metadata fields?)
>>
> Year that sounds like an interesting idea.
I have opened a Jira issue (STANBOL-107), it would be nice to start
with some example, as olivier suggested, but in this moment I have no
idea :)

Enrico

>
> thx for the feedback
>
> best
> Rupert
>
>
> --
> | Rupert Westenthaler                            rwesten@apache.org
> | Bodenlehenstraße 11                             ++43-699-11108907
> | A-5500 Bischofshofen
>



-- 
Enrico Daga

--
http://www.enridaga.net
skype: enri-pan

Mime
View raw message