stanbol-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rupert Westenthaler <rwes...@apache.org>
Subject Re: Stanbol Enhancement Structure (discussion)
Date Tue, 01 Mar 2011 11:56:36 GMT
On Tue, Mar 1, 2011 at 9:58 AM, Bertrand Delacretaz
<bdelacretaz@apache.org> wrote:
>Can we publish it to http://incubator.apache.org/stanbol/ ? It will
> show up there anyway as soon as someone publishes the site for another
> reason.
For sure. When I finally managed to get all the markdown syntax right
it was already
much to late and I was no longer in the mood to play around publishing.


On Tue, Mar 1, 2011 at 10:07 AM, Tommaso Teofili
<tommaso.teofili@gmail.com> wrote:
> Hi Rupert,
> I had a quick read of your proposal and I think it's good; the only thing I
> notice is that, if I understood it correctly, the Annotation object can be
> related to "something" not actually contained in the parsed content.
> So think for example to a Concept Annotation, then the concept is something
> abstract that can be "discovered" from the text of the content item but
> doesn't have any Occurrence in the parsed text so I wonder if Annotation is
> the proper name for that since Annotation makes me think to a span of text
> or data I can find in the parsed content. That (maybe) being a minor concern
> I like the proposal.

That is a valid point and I would have never thought about it, because
my thinking about that was that Annotations are extracted from the
ContentItem - the interpretation of the content - and not from the
Content - the data. I agree that Metadata are not part of the Content,
but they are for sure part of the interpretation and therefore there
was no problem based on my mind model.

When I look at the Whiteboard behind me it notes a concept with the
name "Metadata" but I had not liked it from the beginning and as
Andreas Gruber suggested to me that I could rather model it as an
annotation with an occurrence within the metadata I was really happy
to get rid of it.
Maybe it is time to re-introduce it - but in an different manner.

All the Concepts used within this specification are not intended to be
processed by the users. They are manly there to group useful sets of
properties (something like attribute groups in XSD or interfaces in
Java). The really important things are properties like
 - "dc:type": defining the type of the extracted feature
 - "dc:role": defining the role of the extracted feature and
 - "sb:entity": pointing to the definition of the extracted feature

And I think thats where the example "Enhancement of Metadata" got one
think wrong.
It is no good Idea to define the annotation referring to the creator
of the document as "sb:Tag".
I refer to
> <a1> dc:title "Richard Cypher"
> <a1> dc:role sb:Tag
> <a1> dc:type: dbpedia-ont:Person
A document should not be tagged with its creator until it is also
about the creator (e.g. in case of an CV).
This Annotation has a different role. It provides information needed
for the management. This should be reflected by the value of the
"dc:role" property.So maybe we should add a new role such as
"sb:Management".

However this is not true for all features extracted from metadata.
e.g. when extracting the Artist, Title and Album and Genre from ID3
metadata for an mp3 file it makes completely sense to tag this audio
file with all these values. It really depends on the meaning of the
field and EnhancementEngines specific to such standards should deal
with it.
Generally speaking, getting dc:role values right is not an easy task,
but I think this is OK because it will be one of the things that
distinguishes good from not so good enhancement engines. The important
thing is that the defined roles are clear and easy understandable by
users because they will use them to filter enhancement results.

best
Rupert Westenthaler



-- 
| Rupert Westenthaler                            rwesten@apache.org
| Bodenlehenstraße 11                             ++43-699-11108907
| A-5500 Bischofshofen

Mime
View raw message