any23-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From lewi...@apache.org
Subject svn commit: r1538470 [1/5] - in /any23/site: ./ apidocs/ apidocs/0.8.0/ apidocs/0.8.0/org/ apidocs/0.8.0/resources/ apidocs/org/ apidocs/resources/
Date Sun, 03 Nov 2013 21:57:51 GMT
Author: lewismc
Date: Sun Nov  3 21:57:50 2013
New Revision: 1538470

URL: http://svn.apache.org/r1538470
Log:
publish new site for 0.9.1

Added:
    any23/site/apidocs/0.8.0/
    any23/site/apidocs/0.8.0/allclasses-frame.html
      - copied unchanged from r1538438, any23/site/apidocs/allclasses-frame.html
    any23/site/apidocs/0.8.0/allclasses-noframe.html
      - copied unchanged from r1538438, any23/site/apidocs/allclasses-noframe.html
    any23/site/apidocs/0.8.0/constant-values.html
      - copied unchanged from r1538438, any23/site/apidocs/constant-values.html
    any23/site/apidocs/0.8.0/deprecated-list.html
      - copied unchanged from r1538438, any23/site/apidocs/deprecated-list.html
    any23/site/apidocs/0.8.0/help-doc.html
      - copied unchanged from r1538438, any23/site/apidocs/help-doc.html
    any23/site/apidocs/0.8.0/index-all.html
      - copied unchanged from r1538438, any23/site/apidocs/index-all.html
    any23/site/apidocs/0.8.0/index.html
      - copied unchanged from r1538438, any23/site/apidocs/index.html
    any23/site/apidocs/0.8.0/options
      - copied unchanged from r1538438, any23/site/apidocs/options
    any23/site/apidocs/0.8.0/org/
      - copied from r1538438, any23/site/apidocs/org/
    any23/site/apidocs/0.8.0/overview-frame.html
      - copied unchanged from r1538438, any23/site/apidocs/overview-frame.html
    any23/site/apidocs/0.8.0/overview-summary.html
      - copied unchanged from r1538438, any23/site/apidocs/overview-summary.html
    any23/site/apidocs/0.8.0/overview-tree.html
      - copied unchanged from r1538438, any23/site/apidocs/overview-tree.html
    any23/site/apidocs/0.8.0/package-list
      - copied unchanged from r1538438, any23/site/apidocs/package-list
    any23/site/apidocs/0.8.0/packages
      - copied unchanged from r1538438, any23/site/apidocs/packages
    any23/site/apidocs/0.8.0/resources/
      - copied from r1538438, any23/site/apidocs/resources/
    any23/site/apidocs/0.8.0/serialized-form.html
      - copied unchanged from r1538438, any23/site/apidocs/serialized-form.html
    any23/site/apidocs/0.8.0/stylesheet.css
      - copied unchanged from r1538438, any23/site/apidocs/stylesheet.css
Removed:
    any23/site/apidocs/allclasses-frame.html
    any23/site/apidocs/allclasses-noframe.html
    any23/site/apidocs/constant-values.html
    any23/site/apidocs/deprecated-list.html
    any23/site/apidocs/help-doc.html
    any23/site/apidocs/index-all.html
    any23/site/apidocs/index.html
    any23/site/apidocs/options
    any23/site/apidocs/org/
    any23/site/apidocs/overview-frame.html
    any23/site/apidocs/overview-summary.html
    any23/site/apidocs/overview-tree.html
    any23/site/apidocs/package-list
    any23/site/apidocs/packages
    any23/site/apidocs/resources/
    any23/site/apidocs/serialized-form.html
    any23/site/apidocs/stylesheet.css
Modified:
    any23/site/acknowledgements.html
    any23/site/any23-plugins.html
    any23/site/build-src.html
    any23/site/configuration.html
    any23/site/dev-csv-extractor.html
    any23/site/dev-data-conversion.html
    any23/site/dev-data-extraction.html
    any23/site/dev-microdata-extractor.html
    any23/site/dev-microformat-extractors.html
    any23/site/dev-validation-fix.html
    any23/site/dev-xpath-extractor.html
    any23/site/developers.html
    any23/site/download.html
    any23/site/extractors.html
    any23/site/getting-started.html
    any23/site/index.html
    any23/site/install.html
    any23/site/integration.html
    any23/site/issue-tracking.html
    any23/site/license.html
    any23/site/mail-lists.html
    any23/site/plugin-basic-crawler.html
    any23/site/plugin-html-scraper.html
    any23/site/plugin-office-scraper.html
    any23/site/poweredby.html
    any23/site/project-info.html
    any23/site/project-reports.html
    any23/site/release-howto.html
    any23/site/service.html
    any23/site/source-repository.html
    any23/site/supported-formats.html
    any23/site/team-list.html

Modified: any23/site/acknowledgements.html
URL: http://svn.apache.org/viewvc/any23/site/acknowledgements.html?rev=1538470&r1=1538469&r2=1538470&view=diff
==============================================================================
--- any23/site/acknowledgements.html (original)
+++ any23/site/acknowledgements.html Sun Nov  3 21:57:50 2013
@@ -1,6 +1,6 @@
 <!DOCTYPE html>
 <!--
- | Generated by Apache Maven Doxia at Jun 26, 2013
+ | Generated by Apache Maven Doxia at 2013-11-03
  | Rendered using Apache Maven Fluido Skin 1.3.0
 -->
 <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
@@ -8,9 +8,9 @@
     <meta charset="UTF-8" />
     <meta name="viewport" content="width=device-width, initial-scale=1.0" />
     <meta name="author" content="The Apache Software Foundation" />
-    <meta name="Date-Revision-yyyymmdd" content="20130626" />
+    <meta name="Date-Revision-yyyymmdd" content="20131103" />
     <meta http-equiv="Content-Language" content="en" />
-    <title>Apache Any23 - Acknowledgments</title>
+    <title>Apache Any23 - Apache Any23 - Acknowledgments</title>
     <link rel="stylesheet" href="./css/apache-maven-fluido-1.3.0.min.css" />
     <link rel="stylesheet" href="./css/site.css" />
     <link rel="stylesheet" href="./css/print.css" media="print" />
@@ -42,8 +42,8 @@
         <ul class="breadcrumb">
                 
                     
-                  <li id="publishDate">Last Published: 2013-06-26</li>
-                  <li class="divider">|</li> <li id="projectVersion">Version: 0.9.0-SNAPSHOT</li>
+                  <li id="publishDate">Last Published: 2013-11-03</li>
+                  <li class="divider">|</li> <li id="projectVersion">Version: 0.9.1-SNAPSHOT</li>
                       
                 
                     
@@ -214,7 +214,21 @@
                 
         <div id="bodyColumn"  class="span9" >
                                   
-            <!-- Licensed to the Apache Software Foundation (ASF) under one or more --><!-- contributor license agreements.  See the NOTICE file distributed with --><!-- this work for additional information regarding copyright ownership. --><!-- The ASF licenses this file to You under the Apache License, Version 2.0 --><!-- (the "License"); you may not use this file except in compliance with --><!-- the License.  You may obtain a copy of the License at --><!--  --><!-- http://www.apache.org/licenses/LICENSE-2.0 --><!--  --><!-- Unless required by applicable law or agreed to in writing, software --><!-- distributed under the License is distributed on an "AS IS" BASIS, --><!-- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. --><!-- See the License for the specific language governing permissions and --><!-- limitations under the License. --><div class="section"><h2>Acknowledgments<a name="Acknowledgments"></a></h2><p>The original code base comes from open-sourc
 ing the <i>&quot;RDFizer&quot;</i> component of the <a class="externalLink" href="http://www.sindice.com">Sindice</a> search engine. The project is supported by <a class="externalLink" href="http://www.apache.ie/">DERI, NUI Galway</a>, <a class="externalLink" href="http://wed.fbk.eu/en/home">Web of Data - FBK</a> and the <a class="externalLink" href="http://www.okkam.org/">OKKAM project (ICT-215032)</a>.</p><p>Individual developers who have contributed to <b>any23</b> include (in alphabetic order): Michele Catasta, Richard Cyganiak, Michele Mostarda, Davide Palmisano, Gabriele Renzi, Juergen Umbrich.</p><p>Below the initial sponsors of the <i>Any23</i> project.</p><img src="./images/logo-sindice-90x30.png" alt="" /><p><a class="externalLink" href="http://sindice.com/">Sindice</a></p><p>Sindice is a platform to build applications on top of this data. Sindice collects Web Data in many ways, following existing web standards, and offers Search and Querying across this data, updated live
  every few minutes.</p><img src="./images/logo-deri-90x30.png" alt="" /><p><a class="externalLink" href="http://www.deri.ie/">Digital Enterprise Research Institute</a></p><p>The vision of the Digital Enterprise Research Institute (DERI) is to be recognised as one of the leading international Web Science research institutes interlinking technologies, information and people to advance business and benefit society.</p><img src="./images/logo-fbk-90x30.png" alt="" /><p><a class="externalLink" href="http://www.fbk.eu/">Fondazione Bruno Kessler</a></p><p>FBK is a research organization of the Autonomous Province of Trento that promotes research in the areas of science, technology, and humanities. Thanks to a close network of alliances and collaborations, FBK also conducts research in theoretical nuclear physics, networking, telecommunications, and social sciences.</p><img src="./images/logo-okkam-90x30.png" alt="" /><p><a class="externalLink" href="http://www.okkam.org/">OKKAM</a></p><p>Th
 e OKKAM project aims at enabling the Web of Entities, namely a virtual space where any collection of data and information about any type of entities (e.g. people, locations, organizations, events, products, ...) published on the Web can be integrated into a single virtual, decentralized, open knowledge base. </p><img src="./images/logo-lod2-90x30.png" alt="" /><p><a class="externalLink" href="http://lod2.eu/">LOD2</a></p><p>The LOD2 consortium comprises expertise in Semantic Web technologies, ontological engineering, machine learning, Web search, information retrieval, databases and knowledge stores. With CWI's reputation in the database world, LOD2 aims to substantially contribute to cross-fertilization between database and semantic web research.</p></div>
+            <!-- Licensed to the Apache Software Foundation (ASF) under one or more --><!-- contributor license agreements.  See the NOTICE file distributed with --><!-- this work for additional information regarding copyright ownership. --><!-- The ASF licenses this file to You under the Apache License, Version 2.0 --><!-- (the "License"); you may not use this file except in compliance with --><!-- the License.  You may obtain a copy of the License at --><!--  --><!-- http://www.apache.org/licenses/LICENSE-2.0 --><!--  --><!-- Unless required by applicable law or agreed to in writing, software --><!-- distributed under the License is distributed on an "AS IS" BASIS, --><!-- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. --><!-- See the License for the specific language governing permissions and --><!-- limitations under the License. --><div class="section">
+<h2>Acknowledgments<a name="Acknowledgments"></a></h2>
+<p>The original code base comes from open-sourcing the <i>&quot;RDFizer&quot;</i> component of the <a class="externalLink" href="http://www.sindice.com">Sindice</a> search engine. The project is supported by <a class="externalLink" href="http://www.apache.ie/">DERI, NUI Galway</a>, <a class="externalLink" href="http://wed.fbk.eu/en/home">Web of Data - FBK</a> and the <a class="externalLink" href="http://www.okkam.org/">OKKAM project (ICT-215032)</a>.</p>
+<p>Individual developers who have contributed to <b>any23</b> include (in alphabetic order): Michele Catasta, Richard Cyganiak, Michele Mostarda, Davide Palmisano, Gabriele Renzi, Juergen Umbrich.</p>
+<p>Below the initial sponsors of the <i>Any23</i> project.</p><img src="./images/logo-sindice-90x30.png" alt="" />
+<p><a class="externalLink" href="http://sindice.com/">Sindice</a></p>
+<p>Sindice is a platform to build applications on top of this data. Sindice collects Web Data in many ways, following existing web standards, and offers Search and Querying across this data, updated live every few minutes.</p><img src="./images/logo-deri-90x30.png" alt="" />
+<p><a class="externalLink" href="http://www.deri.ie/">Digital Enterprise Research Institute</a></p>
+<p>The vision of the Digital Enterprise Research Institute (DERI) is to be recognised as one of the leading international Web Science research institutes interlinking technologies, information and people to advance business and benefit society.</p><img src="./images/logo-fbk-90x30.png" alt="" />
+<p><a class="externalLink" href="http://www.fbk.eu/">Fondazione Bruno Kessler</a></p>
+<p>FBK is a research organization of the Autonomous Province of Trento that promotes research in the areas of science, technology, and humanities. Thanks to a close network of alliances and collaborations, FBK also conducts research in theoretical nuclear physics, networking, telecommunications, and social sciences.</p><img src="./images/logo-okkam-90x30.png" alt="" />
+<p><a class="externalLink" href="http://www.okkam.org/">OKKAM</a></p>
+<p>The OKKAM project aims at enabling the Web of Entities, namely a virtual space where any collection of data and information about any type of entities (e.g. people, locations, organizations, events, products, ...) published on the Web can be integrated into a single virtual, decentralized, open knowledge base. </p><img src="./images/logo-lod2-90x30.png" alt="" />
+<p><a class="externalLink" href="http://lod2.eu/">LOD2</a></p>
+<p>The LOD2 consortium comprises expertise in Semantic Web technologies, ontological engineering, machine learning, Web search, information retrieval, databases and knowledge stores. With CWI's reputation in the database world, LOD2 aims to substantially contribute to cross-fertilization between database and semantic web research.</p></div>
                   </div>
             </div>
           </div>

Modified: any23/site/any23-plugins.html
URL: http://svn.apache.org/viewvc/any23/site/any23-plugins.html?rev=1538470&r1=1538469&r2=1538470&view=diff
==============================================================================
--- any23/site/any23-plugins.html (original)
+++ any23/site/any23-plugins.html Sun Nov  3 21:57:50 2013
@@ -1,6 +1,6 @@
 <!DOCTYPE html>
 <!--
- | Generated by Apache Maven Doxia at Jun 26, 2013
+ | Generated by Apache Maven Doxia at 2013-11-03
  | Rendered using Apache Maven Fluido Skin 1.3.0
 -->
 <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
@@ -8,9 +8,9 @@
     <meta charset="UTF-8" />
     <meta name="viewport" content="width=device-width, initial-scale=1.0" />
     <meta name="author" content="The Apache Software Foundation" />
-    <meta name="Date-Revision-yyyymmdd" content="20130626" />
+    <meta name="Date-Revision-yyyymmdd" content="20131103" />
     <meta http-equiv="Content-Language" content="en" />
-    <title>Apache Any23 - Plugins</title>
+    <title>Apache Any23 - Apache Any23 - Plugins</title>
     <link rel="stylesheet" href="./css/apache-maven-fluido-1.3.0.min.css" />
     <link rel="stylesheet" href="./css/site.css" />
     <link rel="stylesheet" href="./css/print.css" media="print" />
@@ -42,8 +42,8 @@
         <ul class="breadcrumb">
                 
                     
-                  <li id="publishDate">Last Published: 2013-06-26</li>
-                  <li class="divider">|</li> <li id="projectVersion">Version: 0.9.0-SNAPSHOT</li>
+                  <li id="publishDate">Last Published: 2013-11-03</li>
+                  <li class="divider">|</li> <li id="projectVersion">Version: 0.9.1-SNAPSHOT</li>
                       
                 
                     
@@ -214,7 +214,43 @@
                 
         <div id="bodyColumn"  class="span9" >
                                   
-            <!-- Licensed to the Apache Software Foundation (ASF) under one or more --><!-- contributor license agreements.  See the NOTICE file distributed with --><!-- this work for additional information regarding copyright ownership. --><!-- The ASF licenses this file to You under the Apache License, Version 2.0 --><!-- (the "License"); you may not use this file except in compliance with --><!-- the License.  You may obtain a copy of the License at --><!--  --><!-- http://www.apache.org/licenses/LICENSE-2.0 --><!--  --><!-- Unless required by applicable law or agreed to in writing, software --><!-- distributed under the License is distributed on an "AS IS" BASIS, --><!-- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. --><!-- See the License for the specific language governing permissions and --><!-- limitations under the License. --><div class="section"><h2>Apache Any23 Plugins<a name="Apache_Any23_Plugins"></a></h2><div class="section"><h3>Introduction
 <a name="Introduction"></a></h3><p>This section describes the <i>Apache Any23</i> plugins support.</p><p><i>Apache Any23</i> comes with a set of predefined plugins. Such plugins are located under the <i>any23-root</i>/<b>plugins</b> dir.</p><p>A plugin is a standard <i>Maven3</i> module containing any implementation of</p><ul><li><a href="./xref/org/apache/any23/plugin/ExtractorPlugin.html">ExtractorPlugin</a></li><li><a href="./xref/org/apache/any23/cli/Tool.html">Tool</a></li></ul></div><div class="section"><h3>How to Register a Plugin<a name="How_to_Register_a_Plugin"></a></h3><p>A plugin can be added to the <i>Apache Any23 CLI</i> interface by:</p><ul><li>adding its <i>JAR</i> to the <i>Apache Any23</i> <i>JVM classpath</i>;</li><li>adding its <i>JAR</i> to the CLASSPATH_PREFIX environment variable as:<div class="source"><pre class="prettyprint">export CLASSPATH_PREFIX=../../../plugins/basic-crawler/target/any23-basic-crawler-VERSION.jar</pre></div></li><li>adding its <i>JAR</i>
  to the <i>$HOME/.any23/plugins</i> directory.<p>A plugin can be added to the <i>Apache Any23 library API</i> by using the <a href="./xref/org/apache/any23/plugin/Any23PluginManager.html">Any23PluginManager</a>#createInstance(Configuration configuration, File... pluginLocations) method.</p><p>TODO: plugin support in Apache Any23 Service</p><p>Any implementation of <i>ExtractorPlugin</i> will automatically registered to the <a href="./xref/org/apache/any23/extractor/ExtractorRegistry.html">ExtractorRegistry</a>.</p><p>Any detected implementation of <i>Tool</i> will be listed by the <i>ToolRunner</i> command-line tool in <i>any23-root/</i><b>bin/any23</b> .</p></li></ul></div><div class="section"><h3>How to Build a Plugin<a name="How_to_Build_a_Plugin"></a></h3><p><i>Apache Any23</i> takes care to <i>test</i> and <i>package</i> plugins when distributed from its reactor <i>POM</i>. It is aways possible to rebuild a plugin using the command:</p><div class="source"><pre class="prettyprin
 t">&lt;plugin-dir&gt;$ mvn clean assembly:assembly</pre></div></div><div class="section"><h3>How to Write an Extractor Plugin<a name="How_to_Write_an_Extractor_Plugin"></a></h3><p>An <i>Extractor Plugin</i> is a class:</p><ul><li>implementing the <a href="./xref/org/apache/any23/plugin/ExtractorPlugin.html">ExtractorPlugin</a> interface;</li><li>packaged under <b>org.apache.any23.plugin</b> .<p>An example of plugin is defined below.</p><div class="source"><pre class="prettyprint">@Author(name=&quot;Michele Mostarda (mostarda@fbk.eu)&quot;)
+            <!-- Licensed to the Apache Software Foundation (ASF) under one or more --><!-- contributor license agreements.  See the NOTICE file distributed with --><!-- this work for additional information regarding copyright ownership. --><!-- The ASF licenses this file to You under the Apache License, Version 2.0 --><!-- (the "License"); you may not use this file except in compliance with --><!-- the License.  You may obtain a copy of the License at --><!--  --><!-- http://www.apache.org/licenses/LICENSE-2.0 --><!--  --><!-- Unless required by applicable law or agreed to in writing, software --><!-- distributed under the License is distributed on an "AS IS" BASIS, --><!-- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. --><!-- See the License for the specific language governing permissions and --><!-- limitations under the License. --><div class="section">
+<h2>Apache Any23 Plugins<a name="Apache_Any23_Plugins"></a></h2>
+<div class="section">
+<h3>Introduction<a name="Introduction"></a></h3>
+<p>This section describes the <i>Apache Any23</i> plugins support.</p>
+<p><i>Apache Any23</i> comes with a set of predefined plugins. Such plugins are located under the <i>any23-root</i>/<b>plugins</b> dir.</p>
+<p>A plugin is a standard <i>Maven3</i> module containing any implementation of</p>
+<ul>
+<li><a href="./xref/org/apache/any23/plugin/ExtractorPlugin.html">ExtractorPlugin</a></li>
+<li><a href="./xref/org/apache/any23/cli/Tool.html">Tool</a></li></ul></div>
+<div class="section">
+<h3>How to Register a Plugin<a name="How_to_Register_a_Plugin"></a></h3>
+<p>A plugin can be added to the <i>Apache Any23 CLI</i> interface by:</p>
+<ul>
+<li>adding its <i>JAR</i> to the <i>Apache Any23</i> <i>JVM classpath</i>;</li>
+<li>adding its <i>JAR</i> to the CLASSPATH_PREFIX environment variable as:
+<div class="source">
+<pre>export CLASSPATH_PREFIX=../../../plugins/basic-crawler/target/any23-basic-crawler-VERSION.jar</pre></div></li>
+<li>adding its <i>JAR</i> to the <i>$HOME/.any23/plugins</i> directory.
+<p>A plugin can be added to the <i>Apache Any23 library API</i> by using the <a href="./xref/org/apache/any23/plugin/Any23PluginManager.html">Any23PluginManager</a>#createInstance(Configuration configuration, File... pluginLocations) method.</p>
+<p>TODO: plugin support in Apache Any23 Service</p>
+<p>Any implementation of <i>ExtractorPlugin</i> will automatically registered to the <a href="./xref/org/apache/any23/extractor/ExtractorRegistry.html">ExtractorRegistry</a>.</p>
+<p>Any detected implementation of <i>Tool</i> will be listed by the <i>ToolRunner</i> command-line tool in <i>any23-root/</i><b>bin/any23</b> .</p></li></ul></div>
+<div class="section">
+<h3>How to Build a Plugin<a name="How_to_Build_a_Plugin"></a></h3>
+<p><i>Apache Any23</i> takes care to <i>test</i> and <i>package</i> plugins when distributed from its reactor <i>POM</i>. It is aways possible to rebuild a plugin using the command:</p>
+<div class="source">
+<pre>&lt;plugin-dir&gt;$ mvn clean assembly:assembly</pre></div></div>
+<div class="section">
+<h3>How to Write an Extractor Plugin<a name="How_to_Write_an_Extractor_Plugin"></a></h3>
+<p>An <i>Extractor Plugin</i> is a class:</p>
+<ul>
+<li>implementing the <a href="./xref/org/apache/any23/plugin/ExtractorPlugin.html">ExtractorPlugin</a> interface;</li>
+<li>packaged under <b>org.apache.any23.plugin</b> .
+<p>An example of plugin is defined below.</p>
+<div class="source">
+<pre>@Author(name=&quot;Michele Mostarda (mostarda@fbk.eu)&quot;)
 public class HTMLScraperPlugin implements ExtractorPlugin {
 
     private static final Logger logger = LoggerFactory.getLogger(HTMLScraperPlugin.class);
@@ -233,7 +269,17 @@ public class HTMLScraperPlugin implement
         return HTMLScraperExtractor.factory;
     }
 
-}</pre></div></li></ul></div><div class="section"><h3>How to Write a Tool Plugin<a name="How_to_Write_a_Tool_Plugin"></a></h3><p>A <i>Tool Plugin</i> is a Java class that:</p><ul><li>implementing the <a href="./xref/org/apache/any23/cli/Tool.html">Tool</a> interface;</li><li>CLI parameters are extracted by annotating the class members with <a class="externalLink" href="http://jcommander.org/">JCommander</a> annotations.</li><li>have to be found using the <a class="externalLink" href="http://docs.oracle.com/javase/6/docs/api/java/util/ServiceLoader.html">ServiceLoader</a> (we usually plug the Kohsuke's <a class="externalLink" href="http://weblogs.java.net/blog/kohsuke/archive/2009/03/my_project_of_t.html">generator</a>)<p>An example of plugin is defined below.</p><div class="source"><pre class="prettyprint">@MetaInfServices
+}</pre></div></li></ul></div>
+<div class="section">
+<h3>How to Write a Tool Plugin<a name="How_to_Write_a_Tool_Plugin"></a></h3>
+<p>A <i>Tool Plugin</i> is a Java class that:</p>
+<ul>
+<li>implementing the <a href="./xref/org/apache/any23/cli/Tool.html">Tool</a> interface;</li>
+<li>CLI parameters are extracted by annotating the class members with <a class="externalLink" href="http://jcommander.org/">JCommander</a> annotations.</li>
+<li>have to be found using the <a class="externalLink" href="http://docs.oracle.com/javase/6/docs/api/java/util/ServiceLoader.html">ServiceLoader</a> (we usually plug the Kohsuke's <a class="externalLink" href="http://weblogs.java.net/blog/kohsuke/archive/2009/03/my_project_of_t.html">generator</a>)
+<p>An example of plugin is defined below.</p>
+<div class="source">
+<pre>@MetaInfServices
 @Parameters(commandNames = { &quot;myexec&quot; }, commandDescription = &quot;Prints out XXX used by Any23.&quot;)
 public class MyExecutableTool implements Tool {
 
@@ -244,7 +290,22 @@ public class MyExecutableTool implements
         
     }
 
-}</pre></div></li></ul><p>So when executing <tt>any23&gt;&gt;, the &lt;&lt;&lt;myexec</tt> will be available in the commands list.</p></div><div class="section"><h3>Available Extractor Plugins<a name="Available_Extractor_Plugins"></a></h3><ul><li>HTML Scraper Plugin<p>The <i>HTMLScraperPlugin</i> is able to scrape plain text content from any HTML page and transform it into statement literals.</p><p>This plugin is documented <a href="./plugin-html-scraper.html">here</a>.</p></li><li>Office Scraper Plugins<p>The <i>Office Scraper Plugins</i> allow to extract semantic content from several <i>Microsoft Office</i> document formats.</p><p>These plugins are documented <a href="./plugin-office-scraper.html">here</a>.</p></li></ul></div><div class="section"><h3>Available CLI Tool Plugins<a name="Available_CLI_Tool_Plugins"></a></h3><ul><li>Crawler CLI Tool<p>The <a href="./xref/org/apache/any23/cli/Crawler.html">Crawler CLI Tool</a> is an extension of the <a href="./xref/org/apache/any23/cli
 /Rover.html">Rover CLI Tool</a> to add site crawling basic capabilities. More information about the <i>CLI</i> can be found at <a href="./getting-started.html#crawler-tool">Getting Started - Crawler Tool</a> section.</p></li></ul></div></div>
+}</pre></div></li></ul>
+<p>So when executing <tt>any23&gt;&gt;, the &lt;&lt;&lt;myexec</tt> will be available in the commands list.</p></div>
+<div class="section">
+<h3>Available Extractor Plugins<a name="Available_Extractor_Plugins"></a></h3>
+<ul>
+<li>HTML Scraper Plugin
+<p>The <i>HTMLScraperPlugin</i> is able to scrape plain text content from any HTML page and transform it into statement literals.</p>
+<p>This plugin is documented <a href="./plugin-html-scraper.html">here</a>.</p></li>
+<li>Office Scraper Plugins
+<p>The <i>Office Scraper Plugins</i> allow to extract semantic content from several <i>Microsoft Office</i> document formats.</p>
+<p>These plugins are documented <a href="./plugin-office-scraper.html">here</a>.</p></li></ul></div>
+<div class="section">
+<h3>Available CLI Tool Plugins<a name="Available_CLI_Tool_Plugins"></a></h3>
+<ul>
+<li>Crawler CLI Tool
+<p>The <a href="./xref/org/apache/any23/cli/Crawler.html">Crawler CLI Tool</a> is an extension of the <a href="./xref/org/apache/any23/cli/Rover.html">Rover CLI Tool</a> to add site crawling basic capabilities. More information about the <i>CLI</i> can be found at <a href="./getting-started.html#crawler-tool">Getting Started - Crawler Tool</a> section.</p></li></ul></div></div>
                   </div>
             </div>
           </div>

Modified: any23/site/build-src.html
URL: http://svn.apache.org/viewvc/any23/site/build-src.html?rev=1538470&r1=1538469&r2=1538470&view=diff
==============================================================================
--- any23/site/build-src.html (original)
+++ any23/site/build-src.html Sun Nov  3 21:57:50 2013
@@ -1,6 +1,6 @@
 <!DOCTYPE html>
 <!--
- | Generated by Apache Maven Doxia at Jun 26, 2013
+ | Generated by Apache Maven Doxia at 2013-11-03
  | Rendered using Apache Maven Fluido Skin 1.3.0
 -->
 <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
@@ -8,9 +8,9 @@
     <meta charset="UTF-8" />
     <meta name="viewport" content="width=device-width, initial-scale=1.0" />
     <meta name="author" content="The Apache Software Foundation" />
-    <meta name="Date-Revision-yyyymmdd" content="20130626" />
+    <meta name="Date-Revision-yyyymmdd" content="20131103" />
     <meta http-equiv="Content-Language" content="en" />
-    <title>Apache Any23 - Build from sources</title>
+    <title>Apache Any23 - Apache Any23 - Build from sources</title>
     <link rel="stylesheet" href="./css/apache-maven-fluido-1.3.0.min.css" />
     <link rel="stylesheet" href="./css/site.css" />
     <link rel="stylesheet" href="./css/print.css" media="print" />
@@ -42,8 +42,8 @@
         <ul class="breadcrumb">
                 
                     
-                  <li id="publishDate">Last Published: 2013-06-26</li>
-                  <li class="divider">|</li> <li id="projectVersion">Version: 0.9.0-SNAPSHOT</li>
+                  <li id="publishDate">Last Published: 2013-11-03</li>
+                  <li class="divider">|</li> <li id="projectVersion">Version: 0.9.1-SNAPSHOT</li>
                       
                 
                     
@@ -279,7 +279,34 @@
                 
         <div id="bodyColumn"  class="span9" >
                                   
-            <!-- Licensed to the Apache Software Foundation (ASF) under one or more --><!-- contributor license agreements.  See the NOTICE file distributed with --><!-- this work for additional information regarding copyright ownership. --><!-- The ASF licenses this file to You under the Apache License, Version 2.0 --><!-- (the "License"); you may not use this file except in compliance with --><!-- the License.  You may obtain a copy of the License at --><!--  --><!-- http://www.apache.org/licenses/LICENSE-2.0 --><!--  --><!-- Unless required by applicable law or agreed to in writing, software --><!-- distributed under the License is distributed on an "AS IS" BASIS, --><!-- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. --><!-- See the License for the specific language governing permissions and --><!-- limitations under the License. --><div class="section"><h2>Build Apache Any23 from sources<a name="Build_Apache_Any23_from_sources"></a></h2><p>This page de
 scribes how to build <b>Apache Any23</b>.</p><div class="section"><h3>Access a Snapshot Version<a name="Access_a_Snapshot_Version"></a></h3><p>For the latest snapshot please checkout the code from the SVN code repository and build the library. Checkout the code from SVN:</p><div class="source"><pre class="prettyprint">$ svn checkout http://svn.apache.org/repos/asf/any23/trunk apache-any23-trunk-readonly</pre></div></div><div class="section"><h3>Build <b>Apache Any23</b><a name="Build_Apache_Any23"></a></h3><p>The following instructions describe how to build the library with <a class="externalLink" href="http://maven.apache.org/">Maven 2.x.y+</a>. For specific information about Maven see: <a class="externalLink" href="http://maven.apache.org/"></a> Go to the trunk folder:</p><div class="source"><pre class="prettyprint">$ cd trunk/</pre></div><p>and execute the following command:</p><div class="source"><pre class="prettyprint">trunk$ mvn clean install</pre></div><p>This will install t
 he <b>Apache Any23</b> artifact and its dependencies in your local M2 repository.</p></div><div class="section"><h3>Generate Documentation<a name="Generate_Documentation"></a></h3><p>To generate the project site locally execute the following command from the trunk dir:</p><div class="source"><pre class="prettyprint">trunk$ MAVEN_OPTS='-Xmx1024m' mvn clean site</pre></div><p>You can speed up the site generation process specifying the offline option ( -o ), but it works only if all the involved plugin dependencies has been already downloaded in the local M2 repository:</p><div class="source"><pre class="prettyprint">trunk$ MAVEN_OPTS='-Xmx1024m' mvn -o clean site</pre></div><p>If you're interested in generating the Javadoc enriched with navigable UML graphs, you can activate the umlgraphdoc profile. This profile relies on <a class="externalLink" href="http://www.graphviz.org/">Graphviz</a> that must be installed in your system.</p><div class="source"><pre class="prettyprint">trunk$ MA
 VEN_OPTS='-Xmx256m' mvn -P umlgraphdoc clean site</pre></div></div></div>
+            <!-- Licensed to the Apache Software Foundation (ASF) under one or more --><!-- contributor license agreements.  See the NOTICE file distributed with --><!-- this work for additional information regarding copyright ownership. --><!-- The ASF licenses this file to You under the Apache License, Version 2.0 --><!-- (the "License"); you may not use this file except in compliance with --><!-- the License.  You may obtain a copy of the License at --><!--  --><!-- http://www.apache.org/licenses/LICENSE-2.0 --><!--  --><!-- Unless required by applicable law or agreed to in writing, software --><!-- distributed under the License is distributed on an "AS IS" BASIS, --><!-- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. --><!-- See the License for the specific language governing permissions and --><!-- limitations under the License. --><div class="section">
+<h2>Build Apache Any23 from sources<a name="Build_Apache_Any23_from_sources"></a></h2>
+<p>This page describes how to build <b>Apache Any23</b>.</p>
+<div class="section">
+<h3>Access a Snapshot Version<a name="Access_a_Snapshot_Version"></a></h3>
+<p>For the latest snapshot please checkout the code from the SVN code repository and build the library. Checkout the code from SVN:</p>
+<div class="source">
+<pre>$ svn checkout http://svn.apache.org/repos/asf/any23/trunk apache-any23-trunk-readonly</pre></div></div>
+<div class="section">
+<h3>Build <b>Apache Any23</b><a name="Build_Apache_Any23"></a></h3>
+<p>The following instructions describe how to build the library with <a class="externalLink" href="http://maven.apache.org/">Maven 2.x.y+</a>. For specific information about Maven see: <a class="externalLink" href="http://maven.apache.org/"></a> Go to the trunk folder:</p>
+<div class="source">
+<pre>$ cd trunk/</pre></div>
+<p>and execute the following command:</p>
+<div class="source">
+<pre>trunk$ mvn clean install</pre></div>
+<p>This will install the <b>Apache Any23</b> artifact and its dependencies in your local M2 repository.</p></div>
+<div class="section">
+<h3>Generate Documentation<a name="Generate_Documentation"></a></h3>
+<p>To generate the project site locally execute the following command from the trunk dir:</p>
+<div class="source">
+<pre>trunk$ MAVEN_OPTS='-Xmx1024m' mvn clean site</pre></div>
+<p>You can speed up the site generation process specifying the offline option ( -o ), but it works only if all the involved plugin dependencies has been already downloaded in the local M2 repository:</p>
+<div class="source">
+<pre>trunk$ MAVEN_OPTS='-Xmx1024m' mvn -o clean site</pre></div>
+<p>If you're interested in generating the Javadoc enriched with navigable UML graphs, you can activate the umlgraphdoc profile. This profile relies on <a class="externalLink" href="http://www.graphviz.org/">Graphviz</a> that must be installed in your system.</p>
+<div class="source">
+<pre>trunk$ MAVEN_OPTS='-Xmx256m' mvn -P umlgraphdoc clean site</pre></div></div></div>
                   </div>
             </div>
           </div>

Modified: any23/site/configuration.html
URL: http://svn.apache.org/viewvc/any23/site/configuration.html?rev=1538470&r1=1538469&r2=1538470&view=diff
==============================================================================
--- any23/site/configuration.html (original)
+++ any23/site/configuration.html Sun Nov  3 21:57:50 2013
@@ -1,6 +1,6 @@
 <!DOCTYPE html>
 <!--
- | Generated by Apache Maven Doxia at Jun 26, 2013
+ | Generated by Apache Maven Doxia at 2013-11-03
  | Rendered using Apache Maven Fluido Skin 1.3.0
 -->
 <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
@@ -8,9 +8,9 @@
     <meta charset="UTF-8" />
     <meta name="viewport" content="width=device-width, initial-scale=1.0" />
     <meta name="author" content="The Apache Software Foundation" />
-    <meta name="Date-Revision-yyyymmdd" content="20130626" />
+    <meta name="Date-Revision-yyyymmdd" content="20131103" />
     <meta http-equiv="Content-Language" content="en" />
-    <title>Apache Any23 - Configuration</title>
+    <title>Apache Any23 - Apache Any23 - Configuration</title>
     <link rel="stylesheet" href="./css/apache-maven-fluido-1.3.0.min.css" />
     <link rel="stylesheet" href="./css/site.css" />
     <link rel="stylesheet" href="./css/print.css" media="print" />
@@ -42,8 +42,8 @@
         <ul class="breadcrumb">
                 
                     
-                  <li id="publishDate">Last Published: 2013-06-26</li>
-                  <li class="divider">|</li> <li id="projectVersion">Version: 0.9.0-SNAPSHOT</li>
+                  <li id="publishDate">Last Published: 2013-11-03</li>
+                  <li class="divider">|</li> <li id="projectVersion">Version: 0.9.1-SNAPSHOT</li>
                       
                 
                     
@@ -214,16 +214,114 @@
                 
         <div id="bodyColumn"  class="span9" >
                                   
-            <!-- Licensed to the Apache Software Foundation (ASF) under one or more --><!-- contributor license agreements.  See the NOTICE file distributed with --><!-- this work for additional information regarding copyright ownership. --><!-- The ASF licenses this file to You under the Apache License, Version 2.0 --><!-- (the "License"); you may not use this file except in compliance with --><!-- the License.  You may obtain a copy of the License at --><!--  --><!-- http://www.apache.org/licenses/LICENSE-2.0 --><!--  --><!-- Unless required by applicable law or agreed to in writing, software --><!-- distributed under the License is distributed on an "AS IS" BASIS, --><!-- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. --><!-- See the License for the specific language governing permissions and --><!-- limitations under the License. --><div class="section"><h2>Configuration<a name="Configuration"></a></h2><div class="section"><h3>Configure the Core Module<
 a name="Configure_the_Core_Module"></a></h3><p>The core module contains the main library code and the command-line implementation.</p><p>The main library configuration parameters are managed by the <a href="./xref/org/apache/any23/configuration/DefaultConfiguration.html"> Configuration</a> class. The default values are declared within the <a class="externalLink" href="http://any23.googlecode.com/svn/trunk/any23-core/src/main/resources/default-configuration.properties"> default-configuration.properties</a> file. The following sections explain how to override the default configuration.</p><div class="section"><h4>Override Default Configuration from Command-line<a name="Override_Default_Configuration_from_Command-line"></a></h4><p>The default configuration can be overriden via command-line by passing to the <b>java</b> command system properties with the same name of the ones declared in configuration.</p><p>For example to override the <b>HTTP Max Client Connections</b> parameter it is 
 sufficient to add the following option to the <b>java</b> command-line invocation:</p><div class="source"><pre class="prettyprint">-Dany23.http.client.max.connections=10</pre></div><p>any23, any23tools and any23server scripts accept the variable <b>ANY23_OPTS</b> to specify custom options. It is possible to customize the <b>HTTP Max Client Connections</b> for the <b>any23</b> script simply using:</p><div class="source"><pre class="prettyprint">any23-core/bin/$ ANY23_OPTS=&quot;-Dany23.http.client.max.connections=10&quot; any23 http://path/to/resource</pre></div></div><div class="section"><h4>Override Default Configuration Programmatically<a name="Override_Default_Configuration_Programmatically"></a></h4><p>The <a href="./xref/org/apache/any23/configuration/Configuration.html"> Configuration</a> properties can be accessed in read-only mode just retrieving the configuration <b>singleton</b> instance.<br />Such instance is <i>immutable</i>:</p><div class="source"><pre class="prettyprin
 t">final Configuration immutableConf = DefaultConfiguration.singleton();
+            <!-- Licensed to the Apache Software Foundation (ASF) under one or more --><!-- contributor license agreements.  See the NOTICE file distributed with --><!-- this work for additional information regarding copyright ownership. --><!-- The ASF licenses this file to You under the Apache License, Version 2.0 --><!-- (the "License"); you may not use this file except in compliance with --><!-- the License.  You may obtain a copy of the License at --><!--  --><!-- http://www.apache.org/licenses/LICENSE-2.0 --><!--  --><!-- Unless required by applicable law or agreed to in writing, software --><!-- distributed under the License is distributed on an "AS IS" BASIS, --><!-- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. --><!-- See the License for the specific language governing permissions and --><!-- limitations under the License. --><div class="section">
+<h2>Configuration<a name="Configuration"></a></h2>
+<div class="section">
+<h3>Configure the Core Module<a name="Configure_the_Core_Module"></a></h3>
+<p>The core module contains the main library code and the command-line implementation.</p>
+<p>The main library configuration parameters are managed by the <a href="./xref/org/apache/any23/configuration/DefaultConfiguration.html"> Configuration</a> class. The default values are declared within the <a class="externalLink" href="http://any23.googlecode.com/svn/trunk/any23-core/src/main/resources/default-configuration.properties"> default-configuration.properties</a> file. The following sections explain how to override the default configuration.</p>
+<div class="section">
+<h4>Override Default Configuration from Command-line<a name="Override_Default_Configuration_from_Command-line"></a></h4>
+<p>The default configuration can be overriden via command-line by passing to the <b>java</b> command system properties with the same name of the ones declared in configuration.</p>
+<p>For example to override the <b>HTTP Max Client Connections</b> parameter it is sufficient to add the following option to the <b>java</b> command-line invocation:</p>
+<div class="source">
+<pre>-Dany23.http.client.max.connections=10</pre></div>
+<p>any23, any23tools and any23server scripts accept the variable <b>ANY23_OPTS</b> to specify custom options. It is possible to customize the <b>HTTP Max Client Connections</b> for the <b>any23</b> script simply using:</p>
+<div class="source">
+<pre>any23-core/bin/$ ANY23_OPTS=&quot;-Dany23.http.client.max.connections=10&quot; any23 http://path/to/resource</pre></div></div>
+<div class="section">
+<h4>Override Default Configuration Programmatically<a name="Override_Default_Configuration_Programmatically"></a></h4>
+<p>The <a href="./xref/org/apache/any23/configuration/Configuration.html"> Configuration</a> properties can be accessed in read-only mode just retrieving the configuration <b>singleton</b> instance.<br />Such instance is <i>immutable</i>:</p>
+<div class="source">
+<pre>final Configuration immutableConf = DefaultConfiguration.singleton();
 final String propertyValue = immutableConf.getProperty(&quot;propertyName&quot;, &quot;default value&quot;);
-...</pre></div><p>To obtain a <i>modifiable</i> <a href="./xref/org/apache/any23/configuration/Configuration.html"> Configuration</a> instead it is possible to use the <b>copy()</b> method.<br />One of the <b>Apache Any23</b> constructors accepts a <b>Configuration</b> object that allows to customize the behavior of the <b>Apache Any23</b> instance for its entire life-cycle.</p><div class="source"><pre class="prettyprint">final ModifiableConfiguration modifiableConf = DefaultConfiguration.copy();
+...</pre></div>
+<p>To obtain a <i>modifiable</i> <a href="./xref/org/apache/any23/configuration/Configuration.html"> Configuration</a> instead it is possible to use the <b>copy()</b> method.<br />One of the <b>Apache Any23</b> constructors accepts a <b>Configuration</b> object that allows to customize the behavior of the <b>Apache Any23</b> instance for its entire life-cycle.</p>
+<div class="source">
+<pre>final ModifiableConfiguration modifiableConf = DefaultConfiguration.copy();
 final String oldPropertyValue = modifiableConf.setProperty(&quot;propertyName&quot;, &quot;new property value&quot;);
 final Apache Any23 any23 = new Apache Any23(modifiableConf, &quot;extractor1&quot;, ...);
-...</pre></div></div></div><div class="section"><h3>Use of ExtractionParameters<a name="Use_of_ExtractionParameters"></a></h3><p>It is possible to customize the behavior of a single data extraction by providing an <a href="./xref/org/apache/any23/extractor/ExtractionParameters.html"> ExtractionParameters</a> instance to one the <i>Apache Any23#extract()</i> methods accepting it. <b>ExtractionParameters</b> allows to customize any <i>property</i> and <i>flag</i> other then the <b>specific extraction options</b>.<br />If no custom parameters are specified the default configuration values are used.</p><div class="source"><pre class="prettyprint">final Apache Any23 any23 = ...
+...</pre></div></div></div>
+<div class="section">
+<h3>Use of ExtractionParameters<a name="Use_of_ExtractionParameters"></a></h3>
+<p>It is possible to customize the behavior of a single data extraction by providing an <a href="./xref/org/apache/any23/extractor/ExtractionParameters.html"> ExtractionParameters</a> instance to one the <i>Apache Any23#extract()</i> methods accepting it. <b>ExtractionParameters</b> allows to customize any <i>property</i> and <i>flag</i> other then the <b>specific extraction options</b>.<br />If no custom parameters are specified the default configuration values are used.</p>
+<div class="source">
+<pre>final Apache Any23 any23 = ...
 final TripleHandler tripleHandler = ...
 final ExtractionParameters extractionParameters = ExtractionParameters.getDefault();
 extractionParameters.setFlag(&quot;any23.microdata.strict&quot;, true);
-any23.extract(extractionParameters, &quot;http://path/to/doc&quot;, tripleHandler);</pre></div></div><div class="section"><h3>Apache Any23 Core Module Default Configuration<a name="Apache_Any23_Core_Module_Default_Configuration"></a></h3><table border="1" class="table table-striped"><tr class="a"><td align="left">Property Name</td><td align="left">Default Property Value</td><td align="left">Description</td></tr><tr class="b"><td align="left">any23.core.version</td><td align="left"><i>current any23 core version</i></td><td align="left">String declaring the Apache Any23 Core module version.</td></tr><tr class="a"><td align="left">any23.http.user.agent.default</td><td align="left">Apache Any23-CLI</td><td align="left">User Agent Name used for HTTP requests.</td></tr><tr class="b"><td align="left">any23.http.client.timeout</td><td align="left">10000 (10 secs)</td><td align="left">Timeout in milliseconds for a HTTP request.</td></tr><tr class="a"><td align="left">any23.http.client.max.co
 nnections</td><td align="left">5</td><td align="left">Max number of concurrent HTTP connections allowed by the internal Apache Any23 HTTP client.</td></tr><tr class="b"><td align="left">any23.rdfa.extractor.xslt</td><td align="left">rdfa.xslt</td><td align="left">XSLT Stylesheet to be used to perform HTML to RDF extraction of RDFa.</td></tr><tr class="a"><td align="left">any23.extraction.metadata.timesize</td><td align="left">off (possible values: on/off)</td><td align="left">Activates/deactivates the generation of time and size metadata triples.</td></tr><tr class="b"><td align="left">any23.extraction.metadata.nesting</td><td align="left">on (possible values: on/off)</td><td align="left">Activates/deactivates the generation of nesting triples for Microformat entities.</td></tr><tr class="a"><td align="left">any23.extraction.metadata.domain.per.entity</td><td align="left">on (possible values: on/off)</td><td align="left">Activates/deactivates the generation of domain triple per enti
 ty.</td></tr><tr class="b"><td align="left">any23.extraction.rdfa.programmatic</td><td align="left">on (possible values: on/off)</td><td align="left">Switches between the programmatic RDFa 1.1 Extractor and the RDFa 1.0 XSLT base one.</td></tr><tr class="a"><td align="left">any23.extraction.context.uri</td><td align="left">?(means current document URI)</td><td align="left">Default value for extraction content URI.</td></tr><tr class="b"><td align="left">any23.plugin.dirs</td><td align="left">./plugins</td><td align="left">Directory containing Apache Any23 plugins.</td></tr><tr class="a"><td align="left">any23.microdata.strict</td><td align="left">on (possible values: on/off)</td><td align="left">Activates/deactivates the microdata strict validation.</td></tr><tr class="b"><td align="left">any23.microdata.ns.default</td><td align="left">http://rdf.data-vocabulary.org/</td><td align="left">Microdata default namespace.</td></tr><tr class="a"><td align="left">any23.extraction.head.meta<
 /td><td align="left">on (possible values: on/off)</td><td align="left">Activates/deactivates the HTMLMetaExtractor.</td></tr><tr class="b"><td align="left">any23.extraction.csv.field</td><td align="left">,</td><td align="left">CSVExtractor field separator.</td></tr><tr class="a"><td align="left">any23.extraction.csv.comment</td><td align="left">#</td><td align="left">CSVExtractor line comment marker.</td></tr></table></div></div>
+any23.extract(extractionParameters, &quot;http://path/to/doc&quot;, tripleHandler);</pre></div></div>
+<div class="section">
+<h3>Apache Any23 Core Module Default Configuration<a name="Apache_Any23_Core_Module_Default_Configuration"></a></h3>
+<table border="1" class="table table-striped">
+<tr class="a">
+<td align="left">Property Name</td>
+<td align="left">Default Property Value</td>
+<td align="left">Description</td></tr>
+<tr class="b">
+<td align="left">any23.core.version</td>
+<td align="left"><i>current any23 core version</i></td>
+<td align="left">String declaring the Apache Any23 Core module version.</td></tr>
+<tr class="a">
+<td align="left">any23.http.user.agent.default</td>
+<td align="left">Apache Any23-CLI</td>
+<td align="left">User Agent Name used for HTTP requests.</td></tr>
+<tr class="b">
+<td align="left">any23.http.client.timeout</td>
+<td align="left">10000 (10 secs)</td>
+<td align="left">Timeout in milliseconds for a HTTP request.</td></tr>
+<tr class="a">
+<td align="left">any23.http.client.max.connections</td>
+<td align="left">5</td>
+<td align="left">Max number of concurrent HTTP connections allowed by the internal Apache Any23 HTTP client.</td></tr>
+<tr class="b">
+<td align="left">any23.rdfa.extractor.xslt</td>
+<td align="left">rdfa.xslt</td>
+<td align="left">XSLT Stylesheet to be used to perform HTML to RDF extraction of RDFa.</td></tr>
+<tr class="a">
+<td align="left">any23.extraction.metadata.timesize</td>
+<td align="left">off (possible values: on/off)</td>
+<td align="left">Activates/deactivates the generation of time and size metadata triples.</td></tr>
+<tr class="b">
+<td align="left">any23.extraction.metadata.nesting</td>
+<td align="left">on (possible values: on/off)</td>
+<td align="left">Activates/deactivates the generation of nesting triples for Microformat entities.</td></tr>
+<tr class="a">
+<td align="left">any23.extraction.metadata.domain.per.entity</td>
+<td align="left">on (possible values: on/off)</td>
+<td align="left">Activates/deactivates the generation of domain triple per entity.</td></tr>
+<tr class="b">
+<td align="left">any23.extraction.rdfa.programmatic</td>
+<td align="left">on (possible values: on/off)</td>
+<td align="left">Switches between the programmatic RDFa 1.1 Extractor and the RDFa 1.0 XSLT base one.</td></tr>
+<tr class="a">
+<td align="left">any23.extraction.context.uri</td>
+<td align="left">?(means current document URI)</td>
+<td align="left">Default value for extraction content URI.</td></tr>
+<tr class="b">
+<td align="left">any23.plugin.dirs</td>
+<td align="left">./plugins</td>
+<td align="left">Directory containing Apache Any23 plugins.</td></tr>
+<tr class="a">
+<td align="left">any23.microdata.strict</td>
+<td align="left">on (possible values: on/off)</td>
+<td align="left">Activates/deactivates the microdata strict validation.</td></tr>
+<tr class="b">
+<td align="left">any23.microdata.ns.default</td>
+<td align="left">http://rdf.data-vocabulary.org/</td>
+<td align="left">Microdata default namespace.</td></tr>
+<tr class="a">
+<td align="left">any23.extraction.head.meta</td>
+<td align="left">on (possible values: on/off)</td>
+<td align="left">Activates/deactivates the HTMLMetaExtractor.</td></tr>
+<tr class="b">
+<td align="left">any23.extraction.csv.field</td>
+<td align="left">,</td>
+<td align="left">CSVExtractor field separator.</td></tr>
+<tr class="a">
+<td align="left">any23.extraction.csv.comment</td>
+<td align="left">#</td>
+<td align="left">CSVExtractor line comment marker.</td></tr></table></div></div>
                   </div>
             </div>
           </div>

Modified: any23/site/dev-csv-extractor.html
URL: http://svn.apache.org/viewvc/any23/site/dev-csv-extractor.html?rev=1538470&r1=1538469&r2=1538470&view=diff
==============================================================================
--- any23/site/dev-csv-extractor.html (original)
+++ any23/site/dev-csv-extractor.html Sun Nov  3 21:57:50 2013
@@ -1,6 +1,6 @@
 <!DOCTYPE html>
 <!--
- | Generated by Apache Maven Doxia at Jun 26, 2013
+ | Generated by Apache Maven Doxia at 2013-11-03
  | Rendered using Apache Maven Fluido Skin 1.3.0
 -->
 <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
@@ -8,9 +8,9 @@
     <meta charset="UTF-8" />
     <meta name="viewport" content="width=device-width, initial-scale=1.0" />
     <meta name="author" content="The Apache Software Foundation" />
-    <meta name="Date-Revision-yyyymmdd" content="20130626" />
+    <meta name="Date-Revision-yyyymmdd" content="20131103" />
     <meta http-equiv="Content-Language" content="en" />
-    <title>Apache Any23 - CSV Extractor Algorithm</title>
+    <title>Apache Any23 - Apache Any23 - CSV Extractor Algorithm</title>
     <link rel="stylesheet" href="./css/apache-maven-fluido-1.3.0.min.css" />
     <link rel="stylesheet" href="./css/site.css" />
     <link rel="stylesheet" href="./css/print.css" media="print" />
@@ -42,8 +42,8 @@
         <ul class="breadcrumb">
                 
                     
-                  <li id="publishDate">Last Published: 2013-06-26</li>
-                  <li class="divider">|</li> <li id="projectVersion">Version: 0.9.0-SNAPSHOT</li>
+                  <li id="publishDate">Last Published: 2013-11-03</li>
+                  <li class="divider">|</li> <li id="projectVersion">Version: 0.9.1-SNAPSHOT</li>
                       
                 
                     
@@ -279,9 +279,33 @@
                 
         <div id="bodyColumn"  class="span9" >
                                   
-            <!-- Licensed to the Apache Software Foundation (ASF) under one or more --><!-- contributor license agreements.  See the NOTICE file distributed with --><!-- this work for additional information regarding copyright ownership. --><!-- The ASF licenses this file to You under the Apache License, Version 2.0 --><!-- (the "License"); you may not use this file except in compliance with --><!-- the License.  You may obtain a copy of the License at --><!--  --><!-- http://www.apache.org/licenses/LICENSE-2.0 --><!--  --><!-- Unless required by applicable law or agreed to in writing, software --><!-- distributed under the License is distributed on an "AS IS" BASIS, --><!-- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. --><!-- See the License for the specific language governing permissions and --><!-- limitations under the License. --><div class="section"><h2>CSV Extractor Algorithm<a name="CSV_Extractor_Algorithm"></a></h2><p>The <a href="./xref/org/apac
 he/any23/extractor/csv/CSVExtractor.html">CSV Extractor</a> produces an RDF representation of a CSV file compliant with the <a class="externalLink" href="http://www.ietf.org/rfc/rfc4180.txt">RFC 4180</a> and that foresees an header. Such extractor relies on the presence of an header to use the named fields as RDF properties. Field delimiter could be automatically guessed or specified via <a href="./configuration.html">Apache Any23 Configuration</a>.</p><p>Given a document with URL <i>url</i>, <b>Apache Any23</b> uses the following algorithm to extract RDF:</p><ul><li>It tries to guess the fields delimiter and to detect the header</li><li>for each field <i>name</i>:<ul><li>if <i>name</i> is a valid URI keep it as an URI since could be derefenceable.</li><li>if <i>name</i> is not a valid URI, the associated RDF Property URI <i>propUri</i> will be in the form of: <i>url</i> concatenated <i>name</i></li><li>add label statement: <i>propUri</i> rdfs:label <i>name</i></li><li>add column in
 dex statement: <i>propUri</i> &lt;http://vocab.sindice.net/csv/rowPosition&gt; <i>index</i></li></ul></li><li>for each <i>row</i>:<ul><li>add RDFS type statement: &lt;url/row/<i>index</i>&gt; rdfs:type &lt;http://vocab.sindice.net/csv/Row&gt;, where <i>index</i> is the column index number.</li><li>for each <i>cell</i> value:<ul><li>write statement, &lt;url/row/&lt;index&gt;&gt; <i>propUri</i> <i>cell</i> where: <i>cell</i> could be an URI if the cell value is an URI, or a typed literal according the value of the CSV actual value <i>cell</i>.</li></ul></li></ul></li><li>add RDF statements claiming number of rows and columns.</li></ul><p>For example, given this trivial CSV with an header and just two rows:</p><div class="source"><pre class="prettyprint">first name; last name; http://xmlns.org/foaf/01/knows; age
+            <!-- Licensed to the Apache Software Foundation (ASF) under one or more --><!-- contributor license agreements.  See the NOTICE file distributed with --><!-- this work for additional information regarding copyright ownership. --><!-- The ASF licenses this file to You under the Apache License, Version 2.0 --><!-- (the "License"); you may not use this file except in compliance with --><!-- the License.  You may obtain a copy of the License at --><!--  --><!-- http://www.apache.org/licenses/LICENSE-2.0 --><!--  --><!-- Unless required by applicable law or agreed to in writing, software --><!-- distributed under the License is distributed on an "AS IS" BASIS, --><!-- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. --><!-- See the License for the specific language governing permissions and --><!-- limitations under the License. --><div class="section">
+<h2>CSV Extractor Algorithm<a name="CSV_Extractor_Algorithm"></a></h2>
+<p>The <a href="./xref/org/apache/any23/extractor/csv/CSVExtractor.html">CSV Extractor</a> produces an RDF representation of a CSV file compliant with the <a class="externalLink" href="http://www.ietf.org/rfc/rfc4180.txt">RFC 4180</a> and that foresees an header. Such extractor relies on the presence of an header to use the named fields as RDF properties. Field delimiter could be automatically guessed or specified via <a href="./configuration.html">Apache Any23 Configuration</a>.</p>
+<p>Given a document with URL <i>url</i>, <b>Apache Any23</b> uses the following algorithm to extract RDF:</p>
+<ul>
+<li>It tries to guess the fields delimiter and to detect the header</li>
+<li>for each field <i>name</i>:
+<ul>
+<li>if <i>name</i> is a valid URI keep it as an URI since could be derefenceable.</li>
+<li>if <i>name</i> is not a valid URI, the associated RDF Property URI <i>propUri</i> will be in the form of: <i>url</i> concatenated <i>name</i></li>
+<li>add label statement: <i>propUri</i> rdfs:label <i>name</i></li>
+<li>add column index statement: <i>propUri</i> &lt;http://vocab.sindice.net/csv/rowPosition&gt; <i>index</i></li></ul></li>
+<li>for each <i>row</i>:
+<ul>
+<li>add RDFS type statement: &lt;url/row/<i>index</i>&gt; rdfs:type &lt;http://vocab.sindice.net/csv/Row&gt;, where <i>index</i> is the column index number.</li>
+<li>for each <i>cell</i> value:
+<ul>
+<li>write statement, &lt;url/row/&lt;index&gt;&gt; <i>propUri</i> <i>cell</i> where: <i>cell</i> could be an URI if the cell value is an URI, or a typed literal according the value of the CSV actual value <i>cell</i>.</li></ul></li></ul></li>
+<li>add RDF statements claiming number of rows and columns.</li></ul>
+<p>For example, given this trivial CSV with an header and just two rows:</p>
+<div class="source">
+<pre>first name; last name; http://xmlns.org/foaf/01/knows; age
 Davide; Palmisano; http://michelemostarda.com; 30; value should not appear
-Michele; Mostarda; http://g1o.net;</pre></div><p>the following RDF (serialized in RDF/XML) is produced:</p><div class="source"><pre class="prettyprint">&lt;rdf:RDF xmlns:rdf=&quot;http://www.w3.org/1999/02/22-rdf-syntax-ns#&quot;&gt;
+Michele; Mostarda; http://g1o.net;</pre></div>
+<p>the following RDF (serialized in RDF/XML) is produced:</p>
+<div class="source">
+<pre>&lt;rdf:RDF xmlns:rdf=&quot;http://www.w3.org/1999/02/22-rdf-syntax-ns#&quot;&gt;
 
   &lt;rdf:Description rdf:about=&quot;http://bob.example.com/firstName&quot;&gt;
     &lt;label xmlns=&quot;http://www.w3.org/2000/01/rdf-schema#&quot;&gt;first name&lt;/label&gt;

Modified: any23/site/dev-data-conversion.html
URL: http://svn.apache.org/viewvc/any23/site/dev-data-conversion.html?rev=1538470&r1=1538469&r2=1538470&view=diff
==============================================================================
--- any23/site/dev-data-conversion.html (original)
+++ any23/site/dev-data-conversion.html Sun Nov  3 21:57:50 2013
@@ -1,6 +1,6 @@
 <!DOCTYPE html>
 <!--
- | Generated by Apache Maven Doxia at Jun 26, 2013
+ | Generated by Apache Maven Doxia at 2013-11-03
  | Rendered using Apache Maven Fluido Skin 1.3.0
 -->
 <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
@@ -8,9 +8,9 @@
     <meta charset="UTF-8" />
     <meta name="viewport" content="width=device-width, initial-scale=1.0" />
     <meta name="author" content="The Apache Software Foundation" />
-    <meta name="Date-Revision-yyyymmdd" content="20130626" />
+    <meta name="Date-Revision-yyyymmdd" content="20131103" />
     <meta http-equiv="Content-Language" content="en" />
-    <title>Apache Any23 - Data Conversion</title>
+    <title>Apache Any23 - Apache Any23 - Data Conversion</title>
     <link rel="stylesheet" href="./css/apache-maven-fluido-1.3.0.min.css" />
     <link rel="stylesheet" href="./css/site.css" />
     <link rel="stylesheet" href="./css/print.css" media="print" />
@@ -42,8 +42,8 @@
         <ul class="breadcrumb">
                 
                     
-                  <li id="publishDate">Last Published: 2013-06-26</li>
-                  <li class="divider">|</li> <li id="projectVersion">Version: 0.9.0-SNAPSHOT</li>
+                  <li id="publishDate">Last Published: 2013-11-03</li>
+                  <li class="divider">|</li> <li id="projectVersion">Version: 0.9.1-SNAPSHOT</li>
                       
                 
                     
@@ -279,7 +279,10 @@
                 
         <div id="bodyColumn"  class="span9" >
                                   
-            <!-- Licensed to the Apache Software Foundation (ASF) under one or more --><!-- contributor license agreements.  See the NOTICE file distributed with --><!-- this work for additional information regarding copyright ownership. --><!-- The ASF licenses this file to You under the Apache License, Version 2.0 --><!-- (the "License"); you may not use this file except in compliance with --><!-- the License.  You may obtain a copy of the License at --><!--  --><!-- http://www.apache.org/licenses/LICENSE-2.0 --><!--  --><!-- Unless required by applicable law or agreed to in writing, software --><!-- distributed under the License is distributed on an "AS IS" BASIS, --><!-- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. --><!-- See the License for the specific language governing permissions and --><!-- limitations under the License. --><div class="section"><h2>Data Conversion<a name="Data_Conversion"></a></h2><div class="source"><pre class="prettyprint">/*
 1*/ Apache Any23 runner = new Apache Any23();
+            <!-- Licensed to the Apache Software Foundation (ASF) under one or more --><!-- contributor license agreements.  See the NOTICE file distributed with --><!-- this work for additional information regarding copyright ownership. --><!-- The ASF licenses this file to You under the Apache License, Version 2.0 --><!-- (the "License"); you may not use this file except in compliance with --><!-- the License.  You may obtain a copy of the License at --><!--  --><!-- http://www.apache.org/licenses/LICENSE-2.0 --><!--  --><!-- Unless required by applicable law or agreed to in writing, software --><!-- distributed under the License is distributed on an "AS IS" BASIS, --><!-- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. --><!-- See the License for the specific language governing permissions and --><!-- limitations under the License. --><div class="section">
+<h2>Data Conversion<a name="Data_Conversion"></a></h2>
+<div class="source">
+<pre>/*1*/ Apache Any23 runner = new Apache Any23();
 /*2*/ final String content = &quot;@prefix foo: &lt;http://example.org/ns#&gt; .   &quot; +
                              &quot;@prefix : &lt;http://other.example.org/ns#&gt; .&quot; +
                              &quot;foo:bar foo: : .                          &quot; +
@@ -293,7 +296,18 @@
       } finally {
 /*7*/     handler.close();
       }
-/*8*/ String nt = out.toString(&quot;UTF-8&quot;);</pre></div><p>This example aims to demonstrate how to use <b>Apache Any23</b> to perform RDF data conversion. In this code we provide some input data expressed as <b>Turtle</b> and convert it in <b>NTriples</b> format.</p><p>At <b>line 1</b> we define a new instance of the <b>Apache Any23</b> facade, that provides all the methods useful for the transformation. The facade constructor accepts a list of extractor names, if specified the extraction will be done only over this list, otherwise the data <i>MIME Type</i> will detected and will be applied all the compatible extractors declared within the <a href="./xref/org/apache/any23/extractor/ExtractorRegistry.html">ExtractorRegistry</a>.</p><p>The <b>line 2</b> defines the input string containing some <a class="externalLink" href="http://www.w3.org/TeamSubmission/turtle/">Turtle</a> data.</p><p>At <b>line 3</b> we instantiate a <a href="./xref/org/apache/any23/source/StringDocumentSourc
 e.html">StringDocumentSource</a>, specifying a content and a the source <i>URI</i>. The <i>URI</i> should be the source of the content data, and must be valid. Besides the <a href="./xref/org/apache/any23/source/StringDocumentSource.html">StringDocumentSource</a>, you can also provide input from other sources, such as <i>HTTP</i> requests and local files. See the classes in the sources <a href="./xref/org/apache/any23/source/package-summary.html">package</a>.</p><p>The <b>line 4</b> defines a buffered output stream that will be used to store the data produced by the writer declared at <b>line 5</b>.</p><p>A writer stores the extracted triples in some destination. We use an <a href="./xref/org/apache/any23/writer/NTriplesWriter.html">NTriplesWriter</a> here that writes into a <b>ByteArrayOutputStream</b>. The main <b>RDF</b> formats writers are available and it is possible also to store the triples directly into a <b>Sesame</b> repository to query them via <b>SPARQL</b>. See <a href=
 "./xref/org/apache/any23/writer/RepositoryWriter.html">RepositoryWriter</a> and the writer <a href="./xref/org/apache/any23/writer/package-summary.html">package</a>.</p><p>The extractor method invoked at <b>line 6</b> performs the metadata extraction. This method accepts as first argument a <a href="./xref/org/apache/any23/source/DocumentSource.html">DocumentSource</a> and as second argument a <a href="./xref/org/apache/any23/writer/TripleHandler.html">TripleHandler</a>, that will receive the sequence parsing events generated by the applied extractors. The extract method defines also another signature where it is possible to specify a charset encoding for the input data. If <b>null</b>, the charset will be auto detected.</p><p>The <a href="./xref/org/apache/any23/writer/TripleHandler.html">TripleHandler</a> needs to be explicitly closed, this is done safely in a <b>finally</b> block at <b>line 7</b>.</p><p>The expected output is <i>UTF-8</i> encoded at <b>line 8</b>:</p><div class="
 source"><pre class="prettyprint">&lt;http://example.org/ns#bar&gt; &lt;http://example.org/ns#&gt; &lt;http://other.example.org/ns#&gt; .
+/*8*/ String nt = out.toString(&quot;UTF-8&quot;);</pre></div>
+<p>This example aims to demonstrate how to use <b>Apache Any23</b> to perform RDF data conversion. In this code we provide some input data expressed as <b>Turtle</b> and convert it in <b>NTriples</b> format.</p>
+<p>At <b>line 1</b> we define a new instance of the <b>Apache Any23</b> facade, that provides all the methods useful for the transformation. The facade constructor accepts a list of extractor names, if specified the extraction will be done only over this list, otherwise the data <i>MIME Type</i> will detected and will be applied all the compatible extractors declared within the <a href="./xref/org/apache/any23/extractor/ExtractorRegistry.html">ExtractorRegistry</a>.</p>
+<p>The <b>line 2</b> defines the input string containing some <a class="externalLink" href="http://www.w3.org/TeamSubmission/turtle/">Turtle</a> data.</p>
+<p>At <b>line 3</b> we instantiate a <a href="./xref/org/apache/any23/source/StringDocumentSource.html">StringDocumentSource</a>, specifying a content and a the source <i>URI</i>. The <i>URI</i> should be the source of the content data, and must be valid. Besides the <a href="./xref/org/apache/any23/source/StringDocumentSource.html">StringDocumentSource</a>, you can also provide input from other sources, such as <i>HTTP</i> requests and local files. See the classes in the sources <a href="./xref/org/apache/any23/source/package-summary.html">package</a>.</p>
+<p>The <b>line 4</b> defines a buffered output stream that will be used to store the data produced by the writer declared at <b>line 5</b>.</p>
+<p>A writer stores the extracted triples in some destination. We use an <a href="./xref/org/apache/any23/writer/NTriplesWriter.html">NTriplesWriter</a> here that writes into a <b>ByteArrayOutputStream</b>. The main <b>RDF</b> formats writers are available and it is possible also to store the triples directly into a <b>Sesame</b> repository to query them via <b>SPARQL</b>. See <a href="./xref/org/apache/any23/writer/RepositoryWriter.html">RepositoryWriter</a> and the writer <a href="./xref/org/apache/any23/writer/package-summary.html">package</a>.</p>
+<p>The extractor method invoked at <b>line 6</b> performs the metadata extraction. This method accepts as first argument a <a href="./xref/org/apache/any23/source/DocumentSource.html">DocumentSource</a> and as second argument a <a href="./xref/org/apache/any23/writer/TripleHandler.html">TripleHandler</a>, that will receive the sequence parsing events generated by the applied extractors. The extract method defines also another signature where it is possible to specify a charset encoding for the input data. If <b>null</b>, the charset will be auto detected.</p>
+<p>The <a href="./xref/org/apache/any23/writer/TripleHandler.html">TripleHandler</a> needs to be explicitly closed, this is done safely in a <b>finally</b> block at <b>line 7</b>.</p>
+<p>The expected output is <i>UTF-8</i> encoded at <b>line 8</b>:</p>
+<div class="source">
+<pre>&lt;http://example.org/ns#bar&gt; &lt;http://example.org/ns#&gt; &lt;http://other.example.org/ns#&gt; .
 &lt;http://other.example.org/ns#bar&gt; &lt;http://other.example.org/ns#&gt; &lt;http://example.org/ns#bar&gt; .</pre></div></div>
                   </div>
             </div>

Modified: any23/site/dev-data-extraction.html
URL: http://svn.apache.org/viewvc/any23/site/dev-data-extraction.html?rev=1538470&r1=1538469&r2=1538470&view=diff
==============================================================================
--- any23/site/dev-data-extraction.html (original)
+++ any23/site/dev-data-extraction.html Sun Nov  3 21:57:50 2013
@@ -1,6 +1,6 @@
 <!DOCTYPE html>
 <!--
- | Generated by Apache Maven Doxia at Jun 26, 2013
+ | Generated by Apache Maven Doxia at 2013-11-03
  | Rendered using Apache Maven Fluido Skin 1.3.0
 -->
 <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
@@ -8,9 +8,9 @@
     <meta charset="UTF-8" />
     <meta name="viewport" content="width=device-width, initial-scale=1.0" />
     <meta name="author" content="The Apache Software Foundation" />
-    <meta name="Date-Revision-yyyymmdd" content="20130626" />
+    <meta name="Date-Revision-yyyymmdd" content="20131103" />
     <meta http-equiv="Content-Language" content="en" />
-    <title>Apache Any23 - Data Extraction</title>
+    <title>Apache Any23 - Apache Any23 - Data Extraction</title>
     <link rel="stylesheet" href="./css/apache-maven-fluido-1.3.0.min.css" />
     <link rel="stylesheet" href="./css/site.css" />
     <link rel="stylesheet" href="./css/print.css" media="print" />
@@ -42,8 +42,8 @@
         <ul class="breadcrumb">
                 
                     
-                  <li id="publishDate">Last Published: 2013-06-26</li>
-                  <li class="divider">|</li> <li id="projectVersion">Version: 0.9.0-SNAPSHOT</li>
+                  <li id="publishDate">Last Published: 2013-11-03</li>
+                  <li class="divider">|</li> <li id="projectVersion">Version: 0.9.1-SNAPSHOT</li>
                       
                 
                     
@@ -279,7 +279,10 @@
                 
         <div id="bodyColumn"  class="span9" >
                                   
-            <!-- Licensed to the Apache Software Foundation (ASF) under one or more --><!-- contributor license agreements.  See the NOTICE file distributed with --><!-- this work for additional information regarding copyright ownership. --><!-- The ASF licenses this file to You under the Apache License, Version 2.0 --><!-- (the "License"); you may not use this file except in compliance with --><!-- the License.  You may obtain a copy of the License at --><!--  --><!-- http://www.apache.org/licenses/LICENSE-2.0 --><!--  --><!-- Unless required by applicable law or agreed to in writing, software --><!-- distributed under the License is distributed on an "AS IS" BASIS, --><!-- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. --><!-- See the License for the specific language governing permissions and --><!-- limitations under the License. --><div class="section"><h2>Data Extraction<a name="Data_Extraction"></a></h2><div class="source"><pre class="prettyprint">/*
 1*/ Apache Any23 runner = new Apache Any23();
+            <!-- Licensed to the Apache Software Foundation (ASF) under one or more --><!-- contributor license agreements.  See the NOTICE file distributed with --><!-- this work for additional information regarding copyright ownership. --><!-- The ASF licenses this file to You under the Apache License, Version 2.0 --><!-- (the "License"); you may not use this file except in compliance with --><!-- the License.  You may obtain a copy of the License at --><!--  --><!-- http://www.apache.org/licenses/LICENSE-2.0 --><!--  --><!-- Unless required by applicable law or agreed to in writing, software --><!-- distributed under the License is distributed on an "AS IS" BASIS, --><!-- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. --><!-- See the License for the specific language governing permissions and --><!-- limitations under the License. --><div class="section">
+<h2>Data Extraction<a name="Data_Extraction"></a></h2>
+<div class="source">
+<pre>/*1*/ Apache Any23 runner = new Apache Any23();
 /*2*/ runner.setHTTPUserAgent(&quot;test-user-agent&quot;);
 /*3*/ HTTPClient httpClient = runner.getHTTPClient();
 /*4*/ DocumentSource source = new HTTPDocumentSource(
@@ -293,7 +296,16 @@
       } finally {
 /*8*/     handler.close();
       }
-/*9*/ String n3 = out.toString(&quot;UTF-8&quot;);</pre></div><p>This example demonstrates the data extraction, that is the main purpose of <b>Apache Any23</b> library. At <b>line 1</b> we define the <b>Apache Any23</b> facade instance. As described before, the constructor allows to enforce the usage of specific extractors.</p><p>The <b>line 2</b> defines the <i>HTTP User Agent</i>, used to identify the client during <i>HTTP</i> data collection. At <b>line 3</b> we use the runner to create an instance of <a href="./xref/org/apache/any23/http/HTTPClient.html">HTTPClient</a>, used by <a href="./xref/org/apache/any23/source/HTTPDocumentSource.html">HTTPDocumentSource</a> for <i>HTTP</i> content fetching.</p><p>The <b>line 4</b> instantiates an <a href="./xref/org/apache/any23/source/HTTPDocumentSource.html">HTTPDocumentSource</a> instance, specifying the <a href="./xref/org/apache/any23/http/HTTPClient.html">HTTPClient</a> and the URL addressing the content to be processed.</p><p>At <b
 >line 5</b> we define a buffered output stream used to store data produced by the <a href="./xref/org/apache/any23/writer/TripleHandler.html">TripleHandler</a> defined at <b>line 6</b>.</p><p>The extraction method at <b>line 7</b> will run the metadata extraction. The produced metadata will be written within the passed <a href="./xref/org/apache/any23/writer/TripleHandler.html">TripleHandler</a> instance.</p><p>The <a href="./xref/org/apache/any23/writer/TripleHandler.html">TripleHandler</a> needs to be explicitly closed, this is done safely in a <b>finally</b> block at <b>line 8</b>.</p><p>The expected output is <i>UTF-8</i> encoded at <b>line 9</b> and is:</p><div class="source"><pre class="prettyprint">&lt;http://www.rentalinrome.com/semanticloft/semanticloft.htm&gt; &lt;http://purl.org/dc/terms/title&gt;
+/*9*/ String n3 = out.toString(&quot;UTF-8&quot;);</pre></div>
+<p>This example demonstrates the data extraction, that is the main purpose of <b>Apache Any23</b> library. At <b>line 1</b> we define the <b>Apache Any23</b> facade instance. As described before, the constructor allows to enforce the usage of specific extractors.</p>
+<p>The <b>line 2</b> defines the <i>HTTP User Agent</i>, used to identify the client during <i>HTTP</i> data collection. At <b>line 3</b> we use the runner to create an instance of <a href="./xref/org/apache/any23/http/HTTPClient.html">HTTPClient</a>, used by <a href="./xref/org/apache/any23/source/HTTPDocumentSource.html">HTTPDocumentSource</a> for <i>HTTP</i> content fetching.</p>
+<p>The <b>line 4</b> instantiates an <a href="./xref/org/apache/any23/source/HTTPDocumentSource.html">HTTPDocumentSource</a> instance, specifying the <a href="./xref/org/apache/any23/http/HTTPClient.html">HTTPClient</a> and the URL addressing the content to be processed.</p>
+<p>At <b>line 5</b> we define a buffered output stream used to store data produced by the <a href="./xref/org/apache/any23/writer/TripleHandler.html">TripleHandler</a> defined at <b>line 6</b>.</p>
+<p>The extraction method at <b>line 7</b> will run the metadata extraction. The produced metadata will be written within the passed <a href="./xref/org/apache/any23/writer/TripleHandler.html">TripleHandler</a> instance.</p>
+<p>The <a href="./xref/org/apache/any23/writer/TripleHandler.html">TripleHandler</a> needs to be explicitly closed, this is done safely in a <b>finally</b> block at <b>line 8</b>.</p>
+<p>The expected output is <i>UTF-8</i> encoded at <b>line 9</b> and is:</p>
+<div class="source">
+<pre>&lt;http://www.rentalinrome.com/semanticloft/semanticloft.htm&gt; &lt;http://purl.org/dc/terms/title&gt;
 &quot;Semantic Loft (beta) - Trastevere apartments | Rental in Rome - rentalinrome.com&quot; .
 
 &lt;http://www.rentalinrome.com/semanticloft/semanticloft.htm#semanticloft&gt;
@@ -316,7 +328,14 @@
 &lt;http://www.w3.org/2006/vcard/ns#adr&gt;
 _:node14r93a8dex1 .
 
-[The complete output is omitted for brevity.]</pre></div></div><div class="section"><h2>Filter Out Accidental Triples<a name="Filter_Out_Accidental_Triples"></a></h2><p>To remove accidental triples <b>Apache Any23</b> provides a set of useful filters, located within the <b>org.apache.any23.filter</b> package.</p><p>The filter <a href="./xref/org/apache/any23/filter/IgnoreTitlesOfEmptyDocuments.html">IgnoreTitlesOfEmptyDocuments</a> removes triples generated by the <a href="./xref/org/apache/any23/extractor/html/TitleExtractor.html">TitleExtractor</a> whether the document is empty.</p><p>The filter <a href="./xref/org/apache/any23/filter/IgnoreAccidentalRDFa.html">IgnoreAccidentalRDFa</a> removes accidental <b>CSS</b> related triples.</p><div class="source"><pre class="prettyprint">RDFWriter rdfWriter = ...
+[The complete output is omitted for brevity.]</pre></div></div>
+<div class="section">
+<h2>Filter Out Accidental Triples<a name="Filter_Out_Accidental_Triples"></a></h2>
+<p>To remove accidental triples <b>Apache Any23</b> provides a set of useful filters, located within the <b>org.apache.any23.filter</b> package.</p>
+<p>The filter <a href="./xref/org/apache/any23/filter/IgnoreTitlesOfEmptyDocuments.html">IgnoreTitlesOfEmptyDocuments</a> removes triples generated by the <a href="./xref/org/apache/any23/extractor/html/TitleExtractor.html">TitleExtractor</a> whether the document is empty.</p>
+<p>The filter <a href="./xref/org/apache/any23/filter/IgnoreAccidentalRDFa.html">IgnoreAccidentalRDFa</a> removes accidental <b>CSS</b> related triples.</p>
+<div class="source">
+<pre>RDFWriter rdfWriter = ...
 TripleHandler rdfWriterHandler = RDFWriterTripleHandler(rdfWriter);
 TripleHandler tripleHandler = new ReportingTripleHandler(
         new IgnoreAccidentalRDFa(

Modified: any23/site/dev-microdata-extractor.html
URL: http://svn.apache.org/viewvc/any23/site/dev-microdata-extractor.html?rev=1538470&r1=1538469&r2=1538470&view=diff
==============================================================================
--- any23/site/dev-microdata-extractor.html (original)
+++ any23/site/dev-microdata-extractor.html Sun Nov  3 21:57:50 2013
@@ -1,6 +1,6 @@
 <!DOCTYPE html>
 <!--
- | Generated by Apache Maven Doxia at Jun 26, 2013
+ | Generated by Apache Maven Doxia at 2013-11-03
  | Rendered using Apache Maven Fluido Skin 1.3.0
 -->
 <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
@@ -8,9 +8,9 @@
     <meta charset="UTF-8" />
     <meta name="viewport" content="width=device-width, initial-scale=1.0" />
     <meta name="author" content="The Apache Software Foundation" />
-    <meta name="Date-Revision-yyyymmdd" content="20130626" />
+    <meta name="Date-Revision-yyyymmdd" content="20131103" />
     <meta http-equiv="Content-Language" content="en" />
-    <title>Apache Any23 - Microdata Extractor</title>
+    <title>Apache Any23 - Apache Any23 - Microdata Extractor</title>
     <link rel="stylesheet" href="./css/apache-maven-fluido-1.3.0.min.css" />
     <link rel="stylesheet" href="./css/site.css" />
     <link rel="stylesheet" href="./css/print.css" media="print" />
@@ -42,8 +42,8 @@
         <ul class="breadcrumb">
                 
                     
-                  <li id="publishDate">Last Published: 2013-06-26</li>
-                  <li class="divider">|</li> <li id="projectVersion">Version: 0.9.0-SNAPSHOT</li>
+                  <li id="publishDate">Last Published: 2013-11-03</li>
+                  <li class="divider">|</li> <li id="projectVersion">Version: 0.9.1-SNAPSHOT</li>
                       
                 
                     
@@ -279,7 +279,11 @@
                 
         <div id="bodyColumn"  class="span9" >
                                   
-            <!-- Licensed to the Apache Software Foundation (ASF) under one or more --><!-- contributor license agreements.  See the NOTICE file distributed with --><!-- this work for additional information regarding copyright ownership. --><!-- The ASF licenses this file to You under the Apache License, Version 2.0 --><!-- (the "License"); you may not use this file except in compliance with --><!-- the License.  You may obtain a copy of the License at --><!--  --><!-- http://www.apache.org/licenses/LICENSE-2.0 --><!--  --><!-- Unless required by applicable law or agreed to in writing, software --><!-- distributed under the License is distributed on an "AS IS" BASIS, --><!-- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. --><!-- See the License for the specific language governing permissions and --><!-- limitations under the License. --><div class="section"><h2>Microdata Extractor<a name="Microdata_Extractor"></a></h2><p>The <b>Microdata</b> extractor is co
 mpliant with the <b>W3C</b> draft specification at <a class="externalLink" href="http://www.w3.org/TR/microdata/">http://www.w3.org/TR/microdata/</a>.</p><p>Such extractor produces an RDF representation of the detected Microdata within an <b>XHTML5</b> document, following the algorithm at section <a class="externalLink" href="http://www.w3.org/TR/microdata/#rdf">http://www.w3.org/TR/microdata/#rdf</a>.</p><p>It is possible to retrieve the <b>JSON</b> representation of the same Microdata as defined at section <a class="externalLink" href="http://www.w3.org/TR/microdata/#json">http://www.w3.org/TR/microdata/#json</a> by using the Microdata commandline tool, see <a href="./getting-started.html#any23tools_script">Getting Started - Apache Any23 Tools</a>.</p></div>
+            <!-- Licensed to the Apache Software Foundation (ASF) under one or more --><!-- contributor license agreements.  See the NOTICE file distributed with --><!-- this work for additional information regarding copyright ownership. --><!-- The ASF licenses this file to You under the Apache License, Version 2.0 --><!-- (the "License"); you may not use this file except in compliance with --><!-- the License.  You may obtain a copy of the License at --><!--  --><!-- http://www.apache.org/licenses/LICENSE-2.0 --><!--  --><!-- Unless required by applicable law or agreed to in writing, software --><!-- distributed under the License is distributed on an "AS IS" BASIS, --><!-- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. --><!-- See the License for the specific language governing permissions and --><!-- limitations under the License. --><div class="section">
+<h2>Microdata Extractor<a name="Microdata_Extractor"></a></h2>
+<p>The <b>Microdata</b> extractor is compliant with the <b>W3C</b> draft specification at <a class="externalLink" href="http://www.w3.org/TR/microdata/">http://www.w3.org/TR/microdata/</a>.</p>
+<p>Such extractor produces an RDF representation of the detected Microdata within an <b>XHTML5</b> document, following the algorithm at section <a class="externalLink" href="http://www.w3.org/TR/microdata/#rdf">http://www.w3.org/TR/microdata/#rdf</a>.</p>
+<p>It is possible to retrieve the <b>JSON</b> representation of the same Microdata as defined at section <a class="externalLink" href="http://www.w3.org/TR/microdata/#json">http://www.w3.org/TR/microdata/#json</a> by using the Microdata commandline tool, see <a href="./getting-started.html#any23tools_script">Getting Started - Apache Any23 Tools</a>.</p></div>
                   </div>
             </div>
           </div>



Mime
View raw message