nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Wechner <>
Subject Getting a semantic version of an "HTML page"
Date Tue, 06 Feb 2007 15:44:57 GMT

Is there any standardized way that nutch is getting a semantic version 
of a web-page, e.g. the HTML page is as follows

 <link rel="semantic-content" href="index-semantic.xml"/>
blablabal ..

and the sematic XML (index-semantic.xml) would be something more useful 
than the HTML itself

<?xml version="1.0"?>

<semantic-of href="index.html">

resp. some RDF or whatever.

Any pointers are very welcome.



Michael Wechner
Wyona      -   Open Source Content Management   -    Apache Lenya                          
+41 44 272 91 61

View raw message