Return-Path: X-Original-To: apmail-incubator-any23-commits-archive@minotaur.apache.org Delivered-To: apmail-incubator-any23-commits-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 43849D592 for ; Tue, 17 Jul 2012 21:49:27 +0000 (UTC) Received: (qmail 8815 invoked by uid 500); 17 Jul 2012 21:49:27 -0000 Delivered-To: apmail-incubator-any23-commits-archive@incubator.apache.org Received: (qmail 8777 invoked by uid 500); 17 Jul 2012 21:49:27 -0000 Mailing-List: contact any23-commits-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: any23-dev@incubator.apache.org Delivered-To: mailing list any23-commits@incubator.apache.org Received: (qmail 8768 invoked by uid 99); 17 Jul 2012 21:49:27 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 17 Jul 2012 21:49:27 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.4] (HELO eris.apache.org) (140.211.11.4) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 17 Jul 2012 21:49:21 +0000 Received: from eris.apache.org (localhost [127.0.0.1]) by eris.apache.org (Postfix) with ESMTP id 57DC42388BEC; Tue, 17 Jul 2012 21:48:36 +0000 (UTC) Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Subject: svn commit: r1362675 [13/26] - in /incubator/any23/site: ./ apache-any23-core/ apache-any23-core/css/ apache-any23-core/images/ apache-any23-core/images/logos/ apache-any23-core/images/profiles/ apache-any23-core/js/ apache-any23-service/ apache-any23-... Date: Tue, 17 Jul 2012 21:48:30 -0000 To: any23-commits@incubator.apache.org From: simonetripodi@apache.org X-Mailer: svnmailer-1.0.8-patched Message-Id: <20120717214836.57DC42388BEC@eris.apache.org> Added: incubator/any23/site/images/any23-overall.png URL: http://svn.apache.org/viewvc/incubator/any23/site/images/any23-overall.png?rev=1362675&view=auto ============================================================================== Binary file - no diff available. Propchange: incubator/any23/site/images/any23-overall.png ------------------------------------------------------------------------------ svn:mime-type = image/png Added: incubator/any23/site/images/apache-tika-90x30.png URL: http://svn.apache.org/viewvc/incubator/any23/site/images/apache-tika-90x30.png?rev=1362675&view=auto ============================================================================== Binary file - no diff available. Propchange: incubator/any23/site/images/apache-tika-90x30.png ------------------------------------------------------------------------------ svn:mime-type = image/png Added: incubator/any23/site/images/fu-logo-90x25.png URL: http://svn.apache.org/viewvc/incubator/any23/site/images/fu-logo-90x25.png?rev=1362675&view=auto ============================================================================== Binary file - no diff available. Propchange: incubator/any23/site/images/fu-logo-90x25.png ------------------------------------------------------------------------------ svn:mime-type = image/png Added: incubator/any23/site/images/kit-logo-90x40.png URL: http://svn.apache.org/viewvc/incubator/any23/site/images/kit-logo-90x40.png?rev=1362675&view=auto ============================================================================== Binary file - no diff available. Propchange: incubator/any23/site/images/kit-logo-90x40.png ------------------------------------------------------------------------------ svn:mime-type = image/png Added: incubator/any23/site/images/logo-lod2-90x30.png URL: http://svn.apache.org/viewvc/incubator/any23/site/images/logo-lod2-90x30.png?rev=1362675&view=auto ============================================================================== Binary file - no diff available. Propchange: incubator/any23/site/images/logo-lod2-90x30.png ------------------------------------------------------------------------------ svn:mime-type = image/png Added: incubator/any23/site/images/profiles/pre-release.png URL: http://svn.apache.org/viewvc/incubator/any23/site/images/profiles/pre-release.png?rev=1362675&view=auto ============================================================================== Binary file - no diff available. Propchange: incubator/any23/site/images/profiles/pre-release.png ------------------------------------------------------------------------------ svn:mime-type = image/png Added: incubator/any23/site/images/profiles/retired.png URL: http://svn.apache.org/viewvc/incubator/any23/site/images/profiles/retired.png?rev=1362675&view=auto ============================================================================== Binary file - no diff available. Propchange: incubator/any23/site/images/profiles/retired.png ------------------------------------------------------------------------------ svn:mime-type = image/png Added: incubator/any23/site/images/profiles/sandbox.png URL: http://svn.apache.org/viewvc/incubator/any23/site/images/profiles/sandbox.png?rev=1362675&view=auto ============================================================================== Binary file - no diff available. Propchange: incubator/any23/site/images/profiles/sandbox.png ------------------------------------------------------------------------------ svn:mime-type = image/png Added: incubator/any23/site/plugin-basic-crawler.html URL: http://svn.apache.org/viewvc/incubator/any23/site/plugin-basic-crawler.html?rev=1362675&view=auto ============================================================================== --- incubator/any23/site/plugin-basic-crawler.html (added) +++ incubator/any23/site/plugin-basic-crawler.html Tue Jul 17 21:48:21 2012 @@ -0,0 +1,211 @@ + + + + + + + Apache Any23 - Plugins - Basic Crawler + + + + + + + + + + + + + + + + + +
+ + + + +
+
+ +
+ +
+ +

Basic Crawler Plugin

The Basic Crawler Plugi n implements a CLI Tool extending Rover to add site crawling capabilities.

The tool can be used to extract semantic content from a small/medium size sites.

To use it make sure to have correctly configured the basic-crawler plugin to be found by the any23tools script (follow the Plugins section instructions):

core/bin/$ ./any23tools Crawler
+usage: [{<url>|<file>}]+ [-d <arg>] [-e <arg>] [-f <arg>] [-h] [-l <arg>]
+       [-maxdepth <arg>] [-maxpages <arg>] [-n] [-numcrawlers <arg>] [-o
+       <arg>] [-p] [-pagefilter <arg>] [-politenessdelay <arg>] [-s]
+       [-storagefolder <arg>] [-t] [-v]
+ -d,--defaultns <arg>       Override the default namespace used to produce
+                            statements.
+ -e <arg>                   Specify a comma-separated list of extractors,
+                            e.g. rdf-xml,rdf-turtle.
+ -f,--Output format <arg>   [turtle (default), rdfxml, ntriples, nquads,
+                            trix, json, uri]
+ -h,--help                  Print this help.
+ -l,--log <arg>             Produce log within a file.
+ -maxdepth <arg>            Max allowed crawler depth. Default: no limit.
+ -maxpages <arg>            Max number of pages before interrupting crawl.
+                            Default: no limit.
+ -n,--nesting               Disable production of nesting triples.
+ -numcrawlers <arg>         Sets the number of crawlers. Default: 10
+ -o,--output <arg>          Specify Output file (defaults to standard
+                            output).
+ -p,--pedantic              Validate and fixes HTML content detecting
+                            commons issues.
+ -pagefilter <arg>          Regex used to filter out page URLs during
+                            crawling. Default:
+                            '.*(\.(css|js|bmp|gif|jpe?g|png|tiff?|mid|mp2|
+                            mp3|mp4|wav|wma|avi|mov|mpeg|ram|m4v|wmv|rm|sm
+                            il|pdf|swf|zip|rar|gz|xml|txt))$'
+ -politenessdelay <arg>     Politeness delay in milliseconds. Default: no
+                            limit.
+ -s,--stats                 Print out extraction statistics.
+ -storagefolder <arg>       Folder used to store crawler temporary data.
+                            Default:
+                            [/var/folders/d5/c_0b4h1d7t1gx6tzz_dn5cj40000g
+                            q/T/]
+ -t,--notrivial             Filter trivial statements (e.g. CSS related
+                            ones).
+ -v,--verbose               Show debug and progress information.
+
+
+ +
+ + + + Propchange: incubator/any23/site/plugin-basic-crawler.html ------------------------------------------------------------------------------ svn:eol-style = native Propchange: incubator/any23/site/plugin-basic-crawler.html ------------------------------------------------------------------------------ svn:keywords = Date Revision Author HeadURL Id Propchange: incubator/any23/site/plugin-basic-crawler.html ------------------------------------------------------------------------------ svn:mime-type = text/html