stanbol-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Walter Kasper <>
Subject Re: User story: Don't want to lose the semantic information I already have inside my CMS
Date Thu, 08 Nov 2012 11:31:38 GMT
Hi Rüdiger,

RDFa extraction from HTML is part of the htmlextractor engine in 
Stanbol. Iwould welcome it if you could test it with yourOpenCms docs.

Best regards,


Rüdiger Kurz wrote:
> Hi Staboler,
> during ApacheCon in Sinsheim I had some interesting conversations with 
> Fabian, Rupert and Anil as result I want to summarize one of the 
> discussions as an user story telling a typical requirement for us as 
> CMS provider.
> Talking about traditional Content Management Systems and assuming that 
> they don't store semantic informations is not correct. For example CMS 
> Systems already deliver RDFa annotated HTML, nearly all systems are 
> providing some tagging/categorizing mechanism. Specially OpenCms 
> provides a generic approach to define a structured content and 
> therefore we have the information that a specific field/item of a 
> content has a specified type and a defined label. E.g. A technology 
> event named ApacheCon takes place in Sinsheim from 05. Nov until 08. 
> Nov 2012 is the information that is already stored in OpenCms. More 
> over OpenCms is able to connect that event with all speakers/persons 
> that will make a presentation on that event, ...
> What we would like to achieve is not only a plain text enhancement 
> more over we are interested in telling Stanbol all informations and 
> associations we already know. In other words we absolutely don't want 
> to lose the semantic information that is already existent in OpenCms.
> A good starting point would be a REST endpoint providing the ability 
> to retrieve a RDFa annotated HTML document and than extracts the RDFa 
> in order to store those inside the semantic-index/entity-hub/... as I 
> previously suggested on the list under the subject "Extend stanbol 
> content hub for RDFa support". Maybe the content hub is not the right 
> component, but the requirement of RDFa extraction is still existent.

Dr. Walter Kasper
Stuhlsatzenhausweg 3
D-66123 Saarbrücken
Tel.:  +49-681-85775-5300
Fax:   +49-681-85775-5338
Deutsches Forschungszentrum fuer Kuenstliche Intelligenz GmbH
Firmensitz: Trippstadter Strasse 122, D-67663 Kaiserslautern

Prof. Dr. Dr. h.c. mult. Wolfgang Wahlster (Vorsitzender)
Dr. Walter Olthoff

Vorsitzender des Aufsichtsrats:
Prof. Dr. h.c. Hans A. Aukes

Amtsgericht Kaiserslautern, HRB 2313

View raw message