incubator-connectors-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From kwri...@apache.org
Subject svn commit: r1034600 - in /incubator/lcf/site: publish/ src/documentation/content/xdocs/
Date Fri, 12 Nov 2010 23:32:52 GMT
Author: kwright
Date: Fri Nov 12 23:32:51 2010
New Revision: 1034600

URL: http://svn.apache.org/viewvc?rev=1034600&view=rev
Log:
Add programmatic operation page as part of actual site.

Added:
    incubator/lcf/site/publish/programmatic-operation.html
    incubator/lcf/site/publish/programmatic-operation.pdf   (with props)
    incubator/lcf/site/src/documentation/content/xdocs/programmatic-operation.xml   (with props)
Modified:
    incubator/lcf/site/publish/developer-resources.html
    incubator/lcf/site/publish/developer-resources.pdf
    incubator/lcf/site/publish/end-user-documentation.pdf
    incubator/lcf/site/publish/faq.pdf
    incubator/lcf/site/publish/how-to-build-and-deploy.pdf
    incubator/lcf/site/publish/index.pdf
    incubator/lcf/site/publish/linkmap.pdf
    incubator/lcf/site/publish/mail.pdf
    incubator/lcf/site/publish/who.pdf
    incubator/lcf/site/src/documentation/content/xdocs/developer-resources.xml

Modified: incubator/lcf/site/publish/developer-resources.html
URL: http://svn.apache.org/viewvc/incubator/lcf/site/publish/developer-resources.html?rev=1034600&r1=1034599&r2=1034600&view=diff
==============================================================================
--- incubator/lcf/site/publish/developer-resources.html (original)
+++ incubator/lcf/site/publish/developer-resources.html Fri Nov 12 23:32:51 2010
@@ -258,7 +258,7 @@ document.write("Last Published: " + docu
 <a name="N10039"></a><a name="howtointegrate"></a>
 <h2 class="h3">How to Integrate</h2>
 <div class="section">
-<p>ManifoldCF provides a number of API's and services.  Documentation of these API's can be found <a href="http://cwiki.apache.org/confluence/display/CONNECTORS/Programmatic+Operation+of+ManifoldCF">here</a>.
+<p>ManifoldCF provides a number of API's and services.  Documentation of these API's can be found <a href="programmatic-operation.html">here</a>.
           </p>
 </div>
     

Modified: incubator/lcf/site/publish/developer-resources.pdf
URL: http://svn.apache.org/viewvc/incubator/lcf/site/publish/developer-resources.pdf?rev=1034600&r1=1034599&r2=1034600&view=diff
==============================================================================
Binary files - no diff available.

Modified: incubator/lcf/site/publish/end-user-documentation.pdf
URL: http://svn.apache.org/viewvc/incubator/lcf/site/publish/end-user-documentation.pdf?rev=1034600&r1=1034599&r2=1034600&view=diff
==============================================================================
Files incubator/lcf/site/publish/end-user-documentation.pdf (original) and incubator/lcf/site/publish/end-user-documentation.pdf Fri Nov 12 23:32:51 2010 differ

Modified: incubator/lcf/site/publish/faq.pdf
URL: http://svn.apache.org/viewvc/incubator/lcf/site/publish/faq.pdf?rev=1034600&r1=1034599&r2=1034600&view=diff
==============================================================================
Files incubator/lcf/site/publish/faq.pdf (original) and incubator/lcf/site/publish/faq.pdf Fri Nov 12 23:32:51 2010 differ

Modified: incubator/lcf/site/publish/how-to-build-and-deploy.pdf
URL: http://svn.apache.org/viewvc/incubator/lcf/site/publish/how-to-build-and-deploy.pdf?rev=1034600&r1=1034599&r2=1034600&view=diff
==============================================================================
Binary files - no diff available.

Modified: incubator/lcf/site/publish/index.pdf
URL: http://svn.apache.org/viewvc/incubator/lcf/site/publish/index.pdf?rev=1034600&r1=1034599&r2=1034600&view=diff
==============================================================================
Binary files - no diff available.

Modified: incubator/lcf/site/publish/linkmap.pdf
URL: http://svn.apache.org/viewvc/incubator/lcf/site/publish/linkmap.pdf?rev=1034600&r1=1034599&r2=1034600&view=diff
==============================================================================
Binary files - no diff available.

Modified: incubator/lcf/site/publish/mail.pdf
URL: http://svn.apache.org/viewvc/incubator/lcf/site/publish/mail.pdf?rev=1034600&r1=1034599&r2=1034600&view=diff
==============================================================================
Binary files - no diff available.

Added: incubator/lcf/site/publish/programmatic-operation.html
URL: http://svn.apache.org/viewvc/incubator/lcf/site/publish/programmatic-operation.html?rev=1034600&view=auto
==============================================================================
--- incubator/lcf/site/publish/programmatic-operation.html (added)
+++ incubator/lcf/site/publish/programmatic-operation.html Fri Nov 12 23:32:51 2010
@@ -0,0 +1,1006 @@
+<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
+<html>
+<head>
+<META http-equiv="Content-Type" content="text/html; charset=UTF-8">
+<meta content="Apache Forrest" name="Generator">
+<meta name="Forrest-version" content="0.9-dev">
+<meta name="Forrest-skin-name" content="lucene">
+<title>Programmatic Operation</title>
+<link type="text/css" href="skin/basic.css" rel="stylesheet">
+<link media="screen" type="text/css" href="skin/screen.css" rel="stylesheet">
+<link media="print" type="text/css" href="skin/print.css" rel="stylesheet">
+<link type="text/css" href="skin/profile.css" rel="stylesheet">
+<script src="skin/getBlank.js" language="javascript" type="text/javascript"></script><script src="skin/getMenu.js" language="javascript" type="text/javascript"></script><script src="skin/fontsize.js" language="javascript" type="text/javascript"></script>
+<link rel="shortcut icon" href="images/favicon.ico">
+</head>
+<body onload="init()">
+<script type="text/javascript">ndeSetTextSize();</script>
+<div id="top">
+<!--+
+    |breadtrail
+    +-->
+<div class="breadtrail">
+<a href="http://www.apache.org/">Apache</a><script src="skin/breadcrumbs.js" language="JavaScript" type="text/javascript"></script>
+</div>
+<!--+
+    |header
+    +-->
+<div class="header">
+<!--+
+    |start group logo
+    +-->
+<div class="grouplogo">
+<a href="http://www.apache.org"><img class="logoImage" alt="Apache" src="images/apache_feather.gif" title="Apache Software Foundation"></a>
+</div>
+<!--+
+    |end group logo
+    +-->
+<!--+
+    |start Project Logo
+    +-->
+<div class="projectlogo">
+<a href="http://incubator.apache.org/lcf"><img class="logoImage" alt="Apache ManifoldCF" src="images/ManifoldCF-logo.PNG" title="ManifoldCF"></a>
+</div>
+<!--+
+    |end Project Logo
+    +-->
+<!--+
+    |start Search
+    +-->
+<div class="searchbox">
+<form action="http://www.lucidimagination.com/search/" method="get" class="roundtopsmall">
+<input onFocus="getBlank (this, 'Search the site with Solr');" size="25" name="q" id="query" type="text" value="Search the site with Solr">&nbsp; 
+                    <input name="Search" value="Search" type="submit">
+</form>
+<div style="position: relative; top: -5px; left: -10px">Powered by <a href="http://www.lucidimagination.com" style="color: #033268">Lucid Imagination</a>
+</div>
+</div>
+<!--+
+    |end search
+    +-->
+<!--+
+    |start Tabs
+    +-->
+<ul id="tabs">
+<li class="current">
+<a class="selected" href="index.html">Main</a>
+</li>
+<li>
+<a class="unselected" href="http://cwiki.apache.org/confluence/display/CONNECTORS/Index">Wiki</a>
+</li>
+</ul>
+<!--+
+    |end Tabs
+    +-->
+</div>
+</div>
+<div id="main">
+<div id="publishedStrip">
+<!--+
+    |start Subtabs
+    +-->
+<div id="level2tabs"></div>
+<!--+
+    |end Endtabs
+    +-->
+<script type="text/javascript"><!--
+document.write("Last Published: " + document.lastModified);
+//  --></script>
+</div>
+<!--+
+    |breadtrail
+    +-->
+<div class="breadtrail">
+
+             &nbsp;
+           </div>
+<!--+
+    |start Menu, mainarea
+    +-->
+<!--+
+    |start Menu
+    +-->
+<div id="menu">
+<div onclick="SwitchMenu('menu_1.1', 'skin/')" id="menu_1.1Title" class="menutitle">About</div>
+<div id="menu_1.1" class="menuitemgroup">
+<div class="menuitem">
+<a href="index.html">Welcome</a>
+</div>
+<div class="menuitem">
+<a href="who.html">Who We Are</a>
+</div>
+<div class="menuitem">
+<a href="mail.html">Mailing Lists</a>
+</div>
+<div class="menuitem">
+<a href="http://www.cafepress.com/lucene/">Buy Stuff</a>
+</div>
+<div class="menuitem">
+<a href="http://www.apache.org/foundation/sponsorship.html">Sponsor Apache</a>
+</div>
+<div class="menuitem">
+<a href="http://www.apache.org/foundation/thanks.html">Sponsors of Apache</a>
+</div>
+</div>
+<div onclick="SwitchMenu('menu_1.2', 'skin/')" id="menu_1.2Title" class="menutitle">Documentation</div>
+<div id="menu_1.2" class="menuitemgroup">
+<div class="menuitem">
+<a href="faq.html">Frequently Asked Questions</a>
+</div>
+<div class="menuitem">
+<a href="developer-resources.html">Developer/Integrator Resources</a>
+</div>
+<div class="menuitem">
+<a href="end-user-documentation.html">End-user Documentation</a>
+</div>
+</div>
+<div onclick="SwitchMenu('menu_1.3', 'skin/')" id="menu_1.3Title" class="menutitle">Related-Projects</div>
+<div id="menu_1.3" class="menuitemgroup">
+<div class="menuitem">
+<a href="http://incubator.apache.org/droids/">Droids</a>
+</div>
+<div class="menuitem">
+<a href="http://lucene.apache.org/java/">Java</a>
+</div>
+<div class="menuitem">
+<a href="http://incubator.apache.org/lucene.net/">Lucene.Net</a>
+</div>
+<div class="menuitem">
+<a href="http://lucene.apache.org/lucy/">Lucy</a>
+</div>
+<div class="menuitem">
+<a href="http://lucene.apache.org/mahout/">Mahout</a>
+</div>
+<div class="menuitem">
+<a href="http://lucene.apache.org/nutch/">Nutch</a>
+</div>
+<div class="menuitem">
+<a href="http://lucene.apache.org/openrelevance/">Open Relevance</a>
+</div>
+<div class="menuitem">
+<a href="http://lucene.apache.org/pylucene/">PyLucene</a>
+</div>
+<div class="menuitem">
+<a href="http://lucene.apache.org/solr/">Solr</a>
+</div>
+<div class="menuitem">
+<a href="http://lucene.apache.org/tika/">Tika</a>
+</div>
+</div>
+<div id="credit"></div>
+<div id="roundbottom">
+<img style="display: none" class="corner" height="15" width="15" alt="" src="skin/images/rc-b-l-15-1body-2menu-3menu.png"></div>
+<!--+
+  |alternative credits
+  +-->
+<div id="credit2"></div>
+</div>
+<!--+
+    |end Menu
+    +-->
+<!--+
+    |start content
+    +-->
+<div id="content">
+<div title="Portable Document Format" class="pdflink">
+<a class="dida" href="programmatic-operation.pdf"><img alt="PDF -icon" src="skin/images/pdfdoc.gif" class="skin"><br>
+        PDF</a>
+</div>
+<h1>Programmatic Operation</h1>
+<div id="minitoc-area">
+<ul class="minitoc">
+<li>
+<a href="#Programmatic+Operation">Programmatic Operation</a>
+<ul class="minitoc">
+<li>
+<a href="#Control+by+Servlet+API">Control by Servlet API</a>
+<ul class="minitoc">
+<li>
+<a href="#Output+connector+objects">Output connector objects</a>
+</li>
+<li>
+<a href="#Authority+connector+objects">Authority connector objects</a>
+</li>
+<li>
+<a href="#Repository+connector+objects">Repository connector objects</a>
+</li>
+<li>
+<a href="#Output+connection+objects">Output connection objects</a>
+</li>
+<li>
+<a href="#Authority+connection+objects">Authority connection objects</a>
+</li>
+<li>
+<a href="#Repository+connection+objects">Repository connection objects</a>
+</li>
+<li>
+<a href="#Job+objects">Job objects</a>
+</li>
+<li>
+<a href="#Job+status+objects">Job status objects</a>
+</li>
+<li>
+<a href="#Connection-type-specific+objects">Connection-type-specific objects</a>
+</li>
+<li>
+<a href="#File+system+connector">File system connector</a>
+</li>
+</ul>
+</li>
+<li>
+<a href="#Control+via+Commands">Control via Commands</a>
+</li>
+<li>
+<a href="#Control+by+direct+code">Control by direct code</a>
+</li>
+<li>
+<a href="#Caveats">Caveats</a>
+</li>
+</ul>
+</li>
+</ul>
+</div> 
+    
+<a name="N1000D"></a><a name="Programmatic+Operation"></a>
+<h2 class="h3">Programmatic Operation</h2>
+<div class="section">
+<p></p>
+<p>A certain subset of ManifoldCF users want to think of ManifoldCF as an engine that they can poke from whatever other system they are developing.  While ManifoldCF is not precisely a document indexing engine per se, it can certainly be controlled programmatically.  Right now, there are three principle ways of achieving this control.</p>
+<p></p>
+<a name="N1001A"></a><a name="Control+by+Servlet+API"></a>
+<h3 class="h4">Control by Servlet API</h3>
+<p></p>
+<p>ManifoldCF provides a servlet-based JSON API that gives you the complete ability to define connections and jobs, and control job execution.  You can read about JSON [here|http://www.json.org].  The API is designed to be RESTful in character.  Thus, it makes full use of the HTTP verbs GET, PUT, POST, and DELETE, and represents objects as URLs.  The basic format of the JSON servlet resource URLs is as follows:</p>
+<p></p>
+<p>http[s]://<em>&lt;server_and_port&gt;</em>/mcf-api-service/json/<em>&lt;resource&gt;</em>
+</p>
+<p></p>
+<p>The servlet ignores request data, except when the PUT or POST verb is used.  In that case, the request data is presumed to be a JSON object.  The servlet responds either with an error response code (either 400 or 500) with an appropriate explanatory message, or with a 200 (OK), 201 (CREATED), or 404 (NOT FOUND) response code along with a response JSON object.</p>
+<p></p>
+<p>The actual available resources and commands are as follows:</p>
+<p></p>
+<p></p>
+<p></p>
+<table class="ForrestTable" cellspacing="1" cellpadding="4">
+          
+<tr>
+<th colspan="1" rowspan="1">Resource</th><th colspan="1" rowspan="1">Verb</th><th colspan="1" rowspan="1">What it does</th><th colspan="1" rowspan="1">Input format</th><th colspan="1" rowspan="1">Output format</th>
+</tr>
+          
+<tr>
+<td colspan="1" rowspan="1">outputconnectors</td><td colspan="1" rowspan="1">GET</td><td colspan="1" rowspan="1">List all registered output connectors</td><td colspan="1" rowspan="1">N/A</td><td colspan="1" rowspan="1">{"outputconnector":[<em>&lt;list_of_output_connector_objects&gt;</em>]} <strong>OR</strong> {"error":<em>&lt;error_text&gt;</em>}</td>
+</tr>
+          
+<tr>
+<td colspan="1" rowspan="1">authorityconnectors</td><td colspan="1" rowspan="1">GET</td><td colspan="1" rowspan="1">List all registered authority connectors</td><td colspan="1" rowspan="1">N/A</td><td colspan="1" rowspan="1">{"authorityconnector":[<em>&lt;list_of_authority_connector_objects&gt;</em>]} <strong>OR</strong> {"error":<em>&lt;error_text&gt;</em>}</td>
+</tr>
+          
+<tr>
+<td colspan="1" rowspan="1">repositoryconnectors</td><td colspan="1" rowspan="1">GET</td><td colspan="1" rowspan="1">List all registered repository connectors</td><td colspan="1" rowspan="1">N/A</td><td colspan="1" rowspan="1">{"repositoryconnector":[<em>&lt;list_of_repository_connector_objects&gt;</em>]} <strong>OR</strong> {"error":<em>&lt;error_text&gt;</em>}</td>
+</tr>
+          
+<tr>
+<td colspan="1" rowspan="1">outputconnections</td><td colspan="1" rowspan="1">GET</td><td colspan="1" rowspan="1">List all output connections</td><td colspan="1" rowspan="1">N/A</td><td colspan="1" rowspan="1">{"outputconnection":[<em>&lt;list_of_output_connection_objects&gt;</em>]} <strong>OR</strong> {"error":<em>&lt;error_text&gt;</em>}</td>
+</tr>
+          
+<tr>
+<td colspan="1" rowspan="1">outputconnections/<em>&lt;encoded_connection_name&gt;</em></td><td colspan="1" rowspan="1">GET</td><td colspan="1" rowspan="1">Get a specific output connection</td><td colspan="1" rowspan="1">N/A</td><td colspan="1" rowspan="1">{"outputconnection":<em>&lt;output_connection_object&gt;</em>} <strong>OR</strong> { } <strong>OR</strong> {"error":<em>&lt;error_text&gt;</em>}</td>
+</tr>
+          
+<tr>
+<td colspan="1" rowspan="1">outputconnections/<em>&lt;encoded_connection_name&gt;</em></td><td colspan="1" rowspan="1">PUT</td><td colspan="1" rowspan="1">Save or create an output connection</td><td colspan="1" rowspan="1">{"outputconnection":<em>&lt;output_connection_object&gt;</em>}</td><td colspan="1" rowspan="1">{"connection_name":<em>&lt;connection_name&gt;</em>} <strong>OR</strong> {"error":<em>&lt;error_text&gt;</em>}</td>
+</tr>
+          
+<tr>
+<td colspan="1" rowspan="1">outputconnections/<em>&lt;encoded_connection_name&gt;</em></td><td colspan="1" rowspan="1">DELETE</td><td colspan="1" rowspan="1">Delete an output connection</td><td colspan="1" rowspan="1">N/A</td><td colspan="1" rowspan="1">{ } <strong>OR</strong> {"error":<em>&lt;error_text&gt;</em>}</td>
+</tr>
+          
+<tr>
+<td colspan="1" rowspan="1">status/outputconnections/<em>&lt;encoded_connection_name&gt;</em></td><td colspan="1" rowspan="1">GET</td><td colspan="1" rowspan="1">Check the status of an output connection</td><td colspan="1" rowspan="1">N/A</td><td colspan="1" rowspan="1">{"check_result":<em>&lt;message&gt;</em>} <strong>OR</strong> {"error":<em>&lt;error_text&gt;</em>}</td>
+</tr>
+          
+<tr>
+<td colspan="1" rowspan="1">info/outputconnections/<em>&lt;encoded_connection_name&gt;</em>/<em>&lt;connector_specific_resource&gt;</em></td><td colspan="1" rowspan="1">GET</td><td colspan="1" rowspan="1">Retrieve arbitrary connector-specific resource</td><td colspan="1" rowspan="1">N/A</td><td colspan="1" rowspan="1"><em>&lt;response_data&gt;</em> <strong>OR</strong> {"error":<em>&lt;error_text&gt;</em>} <strong>OR</strong> {"service_interruption":<em>&lt;error_text&gt;</em>}</td>
+</tr>
+          
+<tr>
+<td colspan="1" rowspan="1">authorityconnections</td><td colspan="1" rowspan="1">GET</td><td colspan="1" rowspan="1">List all authority connections</td><td colspan="1" rowspan="1">N/A</td><td colspan="1" rowspan="1">{"authorityconnection":[<em>&lt;list_of_authority_connection_objects&gt;</em>]} <strong>OR</strong> {"error":<em>&lt;error_text&gt;</em>}</td>
+</tr>
+          
+<tr>
+<td colspan="1" rowspan="1">authorityconnections/<em>&lt;encoded_connection_name&gt;</em></td><td colspan="1" rowspan="1">GET</td><td colspan="1" rowspan="1">Get a specific authority connection</td><td colspan="1" rowspan="1">N/A</td><td colspan="1" rowspan="1">{"authorityconnection":<em>&lt;authority_connection_object&gt;</em>} <strong>OR</strong> { } <strong>OR</strong> {"error":<em>&lt;error_text&gt;</em>}</td>
+</tr>
+          
+<tr>
+<td colspan="1" rowspan="1">authorityconnections/<em>&lt;encoded_connection_name&gt;</em></td><td colspan="1" rowspan="1">PUT</td><td colspan="1" rowspan="1">Save or create an authority connection</td><td colspan="1" rowspan="1">{"authorityconnection":<em>&lt;authority_connection_object&gt;</em>}</td><td colspan="1" rowspan="1">{"connection_name":<em>&lt;connection_name&gt;</em>} <strong>OR</strong> {"error":<em>&lt;error_text&gt;</em>}</td>
+</tr>
+          
+<tr>
+<td colspan="1" rowspan="1">authorityconnections/<em>&lt;encoded_connection_name&gt;</em></td><td colspan="1" rowspan="1">DELETE</td><td colspan="1" rowspan="1">Delete an authority connection</td><td colspan="1" rowspan="1">N/A</td><td colspan="1" rowspan="1">{ } <strong>OR</strong> {"error":<em>&lt;error_text&gt;</em>}</td>
+</tr>
+          
+<tr>
+<td colspan="1" rowspan="1">status/authorityconnections/<em>&lt;encoded_connection_name&gt;</em></td><td colspan="1" rowspan="1">GET</td><td colspan="1" rowspan="1">Check the status of an authority connection</td><td colspan="1" rowspan="1">N/A</td><td colspan="1" rowspan="1">{"check_result":<em>&lt;message&gt;</em>} <strong>OR</strong> {"error":<em>&lt;error_text&gt;</em>}</td>
+</tr>
+          
+<tr>
+<td colspan="1" rowspan="1">repositoryconnections</td><td colspan="1" rowspan="1">GET</td><td colspan="1" rowspan="1">List all repository connections</td><td colspan="1" rowspan="1">N/A</td><td colspan="1" rowspan="1">{"repositoryconnection":[<em>&lt;list_of_repository_connection_objects&gt;</em>]} <strong>OR</strong> {"error":<em>&lt;error_text&gt;</em>}</td>
+</tr>
+          
+<tr>
+<td colspan="1" rowspan="1">repositoryconnections/<em>&lt;encoded_connection_name&gt;</em></td><td colspan="1" rowspan="1">GET</td><td colspan="1" rowspan="1">Get a specific repository connection</td><td colspan="1" rowspan="1">N/A</td><td colspan="1" rowspan="1">{"repositoryconnection":<em>&lt;repository_connection_object&gt;</em>} <strong>OR</strong> { } <strong>OR</strong> {"error":<em>&lt;error_text&gt;</em>}</td>
+</tr>
+          
+<tr>
+<td colspan="1" rowspan="1">repositoryconnections/<em>&lt;encoded_connection_name&gt;</em></td><td colspan="1" rowspan="1">PUT</td><td colspan="1" rowspan="1">Save or create a repository connection</td><td colspan="1" rowspan="1">{"repositoryconnection":<em>&lt;repository_connection_object&gt;</em>}</td><td colspan="1" rowspan="1">{"connection_name":<em>&lt;connection_name&gt;</em>} <strong>OR</strong> {"error":<em>&lt;error_text&gt;</em>}</td>
+</tr>
+          
+<tr>
+<td colspan="1" rowspan="1">repositoryconnections/<em>&lt;encoded_connection_name&gt;</em></td><td colspan="1" rowspan="1">DELETE</td><td colspan="1" rowspan="1">Delete a repository connection</td><td colspan="1" rowspan="1">N/A</td><td colspan="1" rowspan="1">{ } <strong>OR</strong> {"error":<em>&lt;error_text&gt;</em>}</td>
+</tr>
+          
+<tr>
+<td colspan="1" rowspan="1">status/repositoryconnections/<em>&lt;encoded_connection_name&gt;</em></td><td colspan="1" rowspan="1">GET</td><td colspan="1" rowspan="1">Check the status of a repository connection</td><td colspan="1" rowspan="1">N/A</td><td colspan="1" rowspan="1">{"check_result":<em>&lt;message&gt;</em>} <strong>OR</strong> {"error":<em>&lt;error_text&gt;</em>}</td>
+</tr>
+          
+<tr>
+<td colspan="1" rowspan="1">info/repositoryconnections/<em>&lt;encoded_connection_name&gt;</em>/<em>&lt;connector_specific_resource&gt;</em></td><td colspan="1" rowspan="1">GET</td><td colspan="1" rowspan="1">Retrieve arbitrary connector-specific resource</td><td colspan="1" rowspan="1">N/A</td><td colspan="1" rowspan="1"><em>&lt;response_data&gt;</em> <strong>OR</strong> {"error":<em>&lt;error_text&gt;</em>} <strong>OR</strong> {"service_interruption":<em>&lt;error_text&gt;</em>}</td>
+</tr>
+          
+<tr>
+<td colspan="1" rowspan="1">jobs</td><td colspan="1" rowspan="1">GET</td><td colspan="1" rowspan="1">List all job definitions</td><td colspan="1" rowspan="1">N/A</td><td colspan="1" rowspan="1">{"job":[<em>&lt;list_of_job_objects&gt;</em>]} <strong>OR</strong> {"error":<em>&lt;error_text&gt;</em>}</td>
+</tr>
+          
+<tr>
+<td colspan="1" rowspan="1">jobs</td><td colspan="1" rowspan="1">POST</td><td colspan="1" rowspan="1">Create a job</td><td colspan="1" rowspan="1">{"job":<em>&lt;job_object&gt;</em>}</td><td colspan="1" rowspan="1">{"job_id":<em>&lt;job_identifier&gt;</em>} <strong>OR</strong> {"error":<em>&lt;error_text&gt;</em>}</td>
+</tr>
+          
+<tr>
+<td colspan="1" rowspan="1">jobs/<em>&lt;job_id&gt;</em></td><td colspan="1" rowspan="1">GET</td><td colspan="1" rowspan="1">Get a specific job definition</td><td colspan="1" rowspan="1">N/A</td><td colspan="1" rowspan="1">{"job":<em>&lt;job_object_&gt;</em>} <strong>OR</strong> { } <strong>OR</strong> {"error":<em>&lt;error_text&gt;</em>}</td>
+</tr>
+          
+<tr>
+<td colspan="1" rowspan="1">jobs/<em>&lt;job_id&gt;</em></td><td colspan="1" rowspan="1">PUT</td><td colspan="1" rowspan="1">Save a job definition</td><td colspan="1" rowspan="1">{"job":<em>&lt;job_object&gt;</em>}</td><td colspan="1" rowspan="1">{"job_id":<em>&lt;job_identifier&gt;</em>} <strong>OR</strong> {"error":<em>&lt;error_text&gt;</em>}</td>
+</tr>
+          
+<tr>
+<td colspan="1" rowspan="1">jobs/<em>&lt;job_id&gt;</em></td><td colspan="1" rowspan="1">DELETE</td><td colspan="1" rowspan="1">Delete a job definition</td><td colspan="1" rowspan="1">N/A</td><td colspan="1" rowspan="1">{ } <strong>OR</strong> {"error":<em>&lt;error_text&gt;</em>}</td>
+</tr>
+          
+<tr>
+<td colspan="1" rowspan="1">jobstatuses</td><td colspan="1" rowspan="1">GET</td><td colspan="1" rowspan="1">List all jobs and their status</td><td colspan="1" rowspan="1">N/A</td><td colspan="1" rowspan="1">{"job":[<em>&lt;list_of_job_status_objects&gt;</em>]} <strong>OR</strong> {"error":<em>&lt;error_text&gt;</em>}</td>
+</tr>
+          
+<tr>
+<td colspan="1" rowspan="1">jobstatuses/<em>&lt;job_id&gt;</em></td><td colspan="1" rowspan="1">GET</td><td colspan="1" rowspan="1">Get a specific job's status</td><td colspan="1" rowspan="1">N/A</td><td colspan="1" rowspan="1">{"jobstatus":<em>&lt;job_status_object&gt;</em>} <strong>OR</strong> { } <strong>OR</strong> {"error":<em>&lt;error_text&gt;</em>} </td>
+</tr>
+          
+<tr>
+<td colspan="1" rowspan="1">start/<em>&lt;job_id&gt;</em></td><td colspan="1" rowspan="1">PUT</td><td colspan="1" rowspan="1">Start a specified job manually</td><td colspan="1" rowspan="1">N/A</td><td colspan="1" rowspan="1">{ } <strong>OR</strong> {"error":<em>&lt;error_text&gt;</em>}</td>
+</tr>
+          
+<tr>
+<td colspan="1" rowspan="1">abort/<em>&lt;job_id&gt;</em></td><td colspan="1" rowspan="1">PUT</td><td colspan="1" rowspan="1">Abort a specified job</td><td colspan="1" rowspan="1">N/A</td><td colspan="1" rowspan="1">{ } <strong>OR</strong> {"error":<em>&lt;error_text&gt;</em>}</td>
+</tr>
+          
+<tr>
+<td colspan="1" rowspan="1">restart/<em>&lt;job_id&gt;</em></td><td colspan="1" rowspan="1">PUT</td><td colspan="1" rowspan="1">Stop and start a specified job</td><td colspan="1" rowspan="1">N/A</td><td colspan="1" rowspan="1">{ } <strong>OR</strong> {"error":<em>&lt;error_text&gt;</em>}</td>
+</tr>
+          
+<tr>
+<td colspan="1" rowspan="1">pause/<em>&lt;job_id&gt;</em></td><td colspan="1" rowspan="1">PUT</td><td colspan="1" rowspan="1">Pause a specified job</td><td colspan="1" rowspan="1">N/A</td><td colspan="1" rowspan="1">{ } <strong>OR</strong> {"error":<em>&lt;error_text&gt;</em>}</td>
+</tr>
+          
+<tr>
+<td colspan="1" rowspan="1">resume/<em>&lt;job_id&gt;</em></td><td colspan="1" rowspan="1">PUT</td><td colspan="1" rowspan="1">Resume a specified job</td><td colspan="1" rowspan="1">N/A</td><td colspan="1" rowspan="1">{ } <strong>OR</strong> {"error":<em>&lt;error_text&gt;</em>}</td>
+</tr>
+        
+</table>
+<p></p>
+<p>Other resources having to do with reports have been planned, but not yet been implemented.</p>
+<p></p>
+<a name="N10480"></a><a name="Output+connector+objects"></a>
+<h4>Output connector objects</h4>
+<p></p>
+<p>The JSON fields an output connector object has are as follows:</p>
+<p></p>
+<table class="ForrestTable" cellspacing="1" cellpadding="4">
+            
+<tr>
+<th colspan="1" rowspan="1">Field</th><th colspan="1" rowspan="1">Meaning</th>
+</tr>
+            
+<tr>
+<td colspan="1" rowspan="1">"description"</td><td colspan="1" rowspan="1">The optional description of the connector</td>
+</tr>
+            
+<tr>
+<td colspan="1" rowspan="1">"class_name"</td><td colspan="1" rowspan="1">The class name of the class implementing the connector</td>
+</tr>
+          
+</table>
+<p></p>
+<a name="N104B1"></a><a name="Authority+connector+objects"></a>
+<h4>Authority connector objects</h4>
+<p></p>
+<p>The JSON fields an authority connector object has are as follows:</p>
+<p></p>
+<table class="ForrestTable" cellspacing="1" cellpadding="4">
+            
+<tr>
+<th colspan="1" rowspan="1">Field</th><th colspan="1" rowspan="1">Meaning</th>
+</tr>
+            
+<tr>
+<td colspan="1" rowspan="1">"description"</td><td colspan="1" rowspan="1">The optional description of the connector</td>
+</tr>
+            
+<tr>
+<td colspan="1" rowspan="1">"class_name"</td><td colspan="1" rowspan="1">The class name of the class implementing the connector</td>
+</tr>
+          
+</table>
+<p></p>
+<a name="N104E2"></a><a name="Repository+connector+objects"></a>
+<h4>Repository connector objects</h4>
+<p></p>
+<p>The JSON fields a repository connector object has are as follows:</p>
+<p></p>
+<table class="ForrestTable" cellspacing="1" cellpadding="4">
+            
+<tr>
+<th colspan="1" rowspan="1">Field</th><th colspan="1" rowspan="1">Meaning</th>
+</tr>
+            
+<tr>
+<td colspan="1" rowspan="1">"description"</td><td colspan="1" rowspan="1">The optional description of the connector</td>
+</tr>
+            
+<tr>
+<td colspan="1" rowspan="1">"class_name"</td><td colspan="1" rowspan="1">The class name of the class implementing the connector</td>
+</tr>
+          
+</table>
+<p></p>
+<a name="N10513"></a><a name="Output+connection+objects"></a>
+<h4>Output connection objects</h4>
+<p></p>
+<p>Output connection names, when they are part of a URL, should be encoded as follows:</p>
+<p></p>
+<ol>
+            
+<li>All instances of '.' should be replaced by '..'.</li>
+            
+<li>All instances of '/' should be replaced by '.+'.</li>
+            
+<li>The URL should be encoded using standard URL utf-8-based %-encoding.</li>
+          
+</ol>
+<p></p>
+<p>The JSON fields an output connection object has are as follows:</p>
+<p></p>
+<table class="ForrestTable" cellspacing="1" cellpadding="4">
+            
+<tr>
+<th colspan="1" rowspan="1">Field</th><th colspan="1" rowspan="1">Meaning</th>
+</tr>
+            
+<tr>
+<td colspan="1" rowspan="1">"name"</td><td colspan="1" rowspan="1">The unique name of the connection</td>
+</tr>
+            
+<tr>
+<td colspan="1" rowspan="1">"description"</td><td colspan="1" rowspan="1">The description of the connection</td>
+</tr>
+            
+<tr>
+<td colspan="1" rowspan="1">"class_name"</td><td colspan="1" rowspan="1">The java class name of the class implementing the connection</td>
+</tr>
+            
+<tr>
+<td colspan="1" rowspan="1">"max_connections"</td><td colspan="1" rowspan="1">The total number of outstanding connections allowed to exist at a time</td>
+</tr>
+            
+<tr>
+<td colspan="1" rowspan="1">"configuration"</td><td colspan="1" rowspan="1">The configuration object for the connection, which is specific to the connection class</td>
+</tr>
+          
+</table>
+<p></p>
+<a name="N10575"></a><a name="Authority+connection+objects"></a>
+<h4>Authority connection objects</h4>
+<p></p>
+<p>Authority connection names, when they are part of a URL, should be encoded as follows:</p>
+<p></p>
+<ol>
+            
+<li>All instances of '.' should be replaced by '..'.</li>
+            
+<li>All instances of '/' should be replaced by '.+'.</li>
+            
+<li>The URL should be encoded using standard URL utf-8-based %-encoding.</li>
+          
+</ol>
+<p></p>
+<p>The JSON fields for an authority connection object are as follows:</p>
+<p></p>
+<table class="ForrestTable" cellspacing="1" cellpadding="4">
+            
+<tr>
+<th colspan="1" rowspan="1">Field</th><th colspan="1" rowspan="1">Meaning</th>
+</tr>
+            
+<tr>
+<td colspan="1" rowspan="1">"name"</td><td colspan="1" rowspan="1">The unique name of the connection</td>
+</tr>
+            
+<tr>
+<td colspan="1" rowspan="1">"description"</td><td colspan="1" rowspan="1">The description of the connection</td>
+</tr>
+            
+<tr>
+<td colspan="1" rowspan="1">"class_name"</td><td colspan="1" rowspan="1">The java class name of the class implementing the connection</td>
+</tr>
+            
+<tr>
+<td colspan="1" rowspan="1">"max_connections"</td><td colspan="1" rowspan="1">The total number of outstanding connections allowed to exist at a time</td>
+</tr>
+            
+<tr>
+<td colspan="1" rowspan="1">"configuration"</td><td colspan="1" rowspan="1">The configuration object for the connection, which is specific to the connection class</td>
+</tr>
+          
+</table>
+<p></p>
+<a name="N105D7"></a><a name="Repository+connection+objects"></a>
+<h4>Repository connection objects</h4>
+<p></p>
+<p>Repository connection names, when they are part of a URL, should be encoded as follows:</p>
+<p></p>
+<ol>
+            
+<li>All instances of '.' should be replaced by '..'.</li>
+            
+<li>All instances of '/' should be replaced by '.+'.</li>
+            
+<li>The URL should be encoded using standard URL utf-8-based %-encoding.</li>
+          
+</ol>
+<p></p>
+<p>The JSON fields for a repository connection object are as follows:</p>
+<p></p>
+<table class="ForrestTable" cellspacing="1" cellpadding="4">
+            
+<tr>
+<th colspan="1" rowspan="1">Field</th><th colspan="1" rowspan="1">Meaning</th>
+</tr>
+            
+<tr>
+<td colspan="1" rowspan="1">"name"</td><td colspan="1" rowspan="1">The unique name of the connection</td>
+</tr>
+            
+<tr>
+<td colspan="1" rowspan="1">"description"</td><td colspan="1" rowspan="1">The description of the connection</td>
+</tr>
+            
+<tr>
+<td colspan="1" rowspan="1">"class_name"</td><td colspan="1" rowspan="1">The java class name of the class implementing the connection</td>
+</tr>
+            
+<tr>
+<td colspan="1" rowspan="1">"max_connections"</td><td colspan="1" rowspan="1">The total number of outstanding connections allowed to exist at a time</td>
+</tr>
+            
+<tr>
+<td colspan="1" rowspan="1">"configuration"</td><td colspan="1" rowspan="1">The configuration object for the connection, which is specific to the connection class</td>
+</tr>
+            
+<tr>
+<td colspan="1" rowspan="1">"acl_authority"</td><td colspan="1" rowspan="1">The (optional) name of the authority that will enforce security for this connection</td>
+</tr>
+            
+<tr>
+<td colspan="1" rowspan="1">"throttle"</td><td colspan="1" rowspan="1">An array of throttle objects, which control how quickly documents can be requested from this connection</td>
+</tr>
+          
+</table>
+<p></p>
+<p>Each throttle object has the following fields:</p>
+<p></p>
+<table class="ForrestTable" cellspacing="1" cellpadding="4">
+            
+<tr>
+<th colspan="1" rowspan="1">Field</th><th colspan="1" rowspan="1">Meaning</th>
+</tr>
+            
+<tr>
+<td colspan="1" rowspan="1">"match"</td><td colspan="1" rowspan="1">The regular expression which is used to match a document's bins to determine if the throttle should be applied</td>
+</tr>
+            
+<tr>
+<td colspan="1" rowspan="1">"match_description"</td><td colspan="1" rowspan="1">Optional text describing the meaning of the throttle</td>
+</tr>
+            
+<tr>
+<td colspan="1" rowspan="1">"rate"</td><td colspan="1" rowspan="1">The maximum fetch rate to use if the throttle applies, in fetches per minute</td>
+</tr>
+          
+</table>
+<p></p>
+<a name="N1067F"></a><a name="Job+objects"></a>
+<h4>Job objects</h4>
+<p></p>
+<p>The JSON fields for a job are is as follows:</p>
+<p></p>
+<table class="ForrestTable" cellspacing="1" cellpadding="4">
+            
+<tr>
+<th colspan="1" rowspan="1">Field</th><th colspan="1" rowspan="1">Meaning</th>
+</tr>
+            
+<tr>
+<td colspan="1" rowspan="1">"id"</td><td colspan="1" rowspan="1">The job's identifier, if present.  If not present, ManifoldCF will create one (and will also create the job when saved).</td>
+</tr>
+            
+<tr>
+<td colspan="1" rowspan="1">"description"</td><td colspan="1" rowspan="1">Text describing the job</td>
+</tr>
+            
+<tr>
+<td colspan="1" rowspan="1">"repository_connection"</td><td colspan="1" rowspan="1">The name of the repository connection to use with the job</td>
+</tr>
+            
+<tr>
+<td colspan="1" rowspan="1">"output_connection"</td><td colspan="1" rowspan="1">The name of the output connection to use with the job</td>
+</tr>
+            
+<tr>
+<td colspan="1" rowspan="1">"document_specification"</td><td colspan="1" rowspan="1">The document specification object for the job, whose format is repository-connection specific</td>
+</tr>
+            
+<tr>
+<td colspan="1" rowspan="1">"output_specification"</td><td colspan="1" rowspan="1">The output specification object for the job, whose format is output-connection specific</td>
+</tr>
+            
+<tr>
+<td colspan="1" rowspan="1">"start_mode"</td><td colspan="1" rowspan="1">The start mode for the job, which can be one of "schedule window start", "schedule window anytime", or "manual"</td>
+</tr>
+            
+<tr>
+<td colspan="1" rowspan="1">"run_mode"</td><td colspan="1" rowspan="1">The run mode for the job, which can be either "continuous" or "scan once"</td>
+</tr>
+            
+<tr>
+<td colspan="1" rowspan="1">"hopcount_mode"</td><td colspan="1" rowspan="1">The hopcount mode for the job, which can be either "accurate", "no delete", "never delete"</td>
+</tr>
+            
+<tr>
+<td colspan="1" rowspan="1">"priority"</td><td colspan="1" rowspan="1">The job's priority, typically "5"</td>
+</tr>
+            
+<tr>
+<td colspan="1" rowspan="1">"recrawl_interval"</td><td colspan="1" rowspan="1">The default time between recrawl of documents (if the job is "continuous"), in milliseconds, or "infinite" for infinity</td>
+</tr>
+            
+<tr>
+<td colspan="1" rowspan="1">"expiration_interval"</td><td colspan="1" rowspan="1">The time until a document expires (if the job is "continuous"), in milliseconds, or "infinite" for infinity</td>
+</tr>
+            
+<tr>
+<td colspan="1" rowspan="1">"reseed_interval"</td><td colspan="1" rowspan="1">The time between reseeding operations (if the job is "continuous"), in milliseconds, or "infinite" for infinity</td>
+</tr>
+            
+<tr>
+<td colspan="1" rowspan="1">"hopcount"</td><td colspan="1" rowspan="1">An array of hopcount objects, describing the link types and associated maximum hops permitted for the job</td>
+</tr>
+            
+<tr>
+<td colspan="1" rowspan="1">"schedule"</td><td colspan="1" rowspan="1">An array of schedule objects, describing when the job should be started and run</td>
+</tr>
+          
+</table>
+<p></p>
+<p>Each hopcount object has the following fields:</p>
+<p></p>
+<table class="ForrestTable" cellspacing="1" cellpadding="4">
+            
+<tr>
+<th colspan="1" rowspan="1">Field</th><th colspan="1" rowspan="1">Meaning</th>
+</tr>
+            
+<tr>
+<td colspan="1" rowspan="1">"link_type"</td><td colspan="1" rowspan="1">The connection-type-dependent type of a link for which a hop count restriction is specified</td>
+</tr>
+            
+<tr>
+<td colspan="1" rowspan="1">"count"</td><td colspan="1" rowspan="1">The maximum number of hops allowed for the associated link type, starting at a seed</td>
+</tr>
+          
+</table>
+<p></p>
+<p>Each schedule object has the following fields:</p>
+<p></p>
+<table class="ForrestTable" cellspacing="1" cellpadding="4">
+            
+<tr>
+<th colspan="1" rowspan="1">Field</th><th colspan="1" rowspan="1">Meaning</th>
+</tr>
+            
+<tr>
+<td colspan="1" rowspan="1">"timezone"</td><td colspan="1" rowspan="1">The optional time zone for the schedule object; if not present the default server time zone is used</td>
+</tr>
+            
+<tr>
+<td colspan="1" rowspan="1">"duration"</td><td colspan="1" rowspan="1">The optional length of the described time window, in milliseconds; if not present, duration is considered infinite</td>
+</tr>
+            
+<tr>
+<td colspan="1" rowspan="1">"dayofweek"</td><td colspan="1" rowspan="1">The optional day-of-the-week enumeration object</td>
+</tr>
+            
+<tr>
+<td colspan="1" rowspan="1">"monthofyear"</td><td colspan="1" rowspan="1">The optional month-of-the-year enumeration object</td>
+</tr>
+            
+<tr>
+<td colspan="1" rowspan="1">"dayofmonth"</td><td colspan="1" rowspan="1">The optional day-of-the-month enumeration object</td>
+</tr>
+            
+<tr>
+<td colspan="1" rowspan="1">"year"</td><td colspan="1" rowspan="1">The optional year enumeration object</td>
+</tr>
+            
+<tr>
+<td colspan="1" rowspan="1">"hourofday"</td><td colspan="1" rowspan="1">The optional hour-of-the-day enumeration object</td>
+</tr>
+            
+<tr>
+<td colspan="1" rowspan="1">"minutesofhour"</td><td colspan="1" rowspan="1">The optional minutes-of-the-hour enumeration object</td>
+</tr>
+          
+</table>
+<p></p>
+<p>Each enumeration object describes an array of integers using the form:</p>
+<p></p>
+<p>{"value":[<em>&lt;integer_list&gt;</em>]}</p>
+<p></p>
+<p>Each integer is a zero-based index describing which entity is being specified.  For example, for "dayofweek", 0 corresponds to Sunday, etc., and thus "dayofweek":{"value":[0,6]} would describe Saturdays and Sundays.</p>
+<p></p>
+<a name="N107D0"></a><a name="Job+status+objects"></a>
+<h4>Job status objects</h4>
+<p></p>
+<p>The JSON fields of a job status object are as follows:</p>
+<p></p>
+<table class="ForrestTable" cellspacing="1" cellpadding="4">
+            
+<tr>
+<th colspan="1" rowspan="1">Field</th><th colspan="1" rowspan="1">Meaning</th>
+</tr>
+            
+<tr>
+<td colspan="1" rowspan="1">"job_id"</td><td colspan="1" rowspan="1">The job identifier</td>
+</tr>
+            
+<tr>
+<td colspan="1" rowspan="1">"status"</td><td colspan="1" rowspan="1">The job status, having the possible values: "not yet run", "running", "paused", "done", "waiting", "starting up", "cleaning up", "error", "aborting", "restarting", "running no connector", and "terminating"</td>
+</tr>
+            
+<tr>
+<td colspan="1" rowspan="1">"error_text"</td><td colspan="1" rowspan="1">The error text, if the status is "error"</td>
+</tr>
+            
+<tr>
+<td colspan="1" rowspan="1">"start_time"</td><td colspan="1" rowspan="1">The job start time, in milliseconds since Jan 1, 1970</td>
+</tr>
+            
+<tr>
+<td colspan="1" rowspan="1">"end_time"</td><td colspan="1" rowspan="1">The job end time, in milliseconds since Jan 1, 1970</td>
+</tr>
+            
+<tr>
+<td colspan="1" rowspan="1">"documents_in_queue"</td><td colspan="1" rowspan="1">The total number of documents in the queue for the job</td>
+</tr>
+            
+<tr>
+<td colspan="1" rowspan="1">"documents_outstanding"</td><td colspan="1" rowspan="1">The number of documents for the job that are currently considered 'active'</td>
+</tr>
+            
+<tr>
+<td colspan="1" rowspan="1">"documents_processed"</td><td colspan="1" rowspan="1">The number of documents that in the queue for the job that have been processed at least once</td>
+</tr>
+          
+</table>
+<p></p>
+<a name="N1083D"></a><a name="Connection-type-specific+objects"></a>
+<h4>Connection-type-specific objects</h4>
+<p></p>
+<p>As you may note when trying to use the above JSON API methods, you cannot get very far in defining connections or jobs without knowing the JSON format of a connection's configuration information, or a job's connection-specific document specification and output specification information.  The form of these objects is controlled by the Java implementation of the underlying connector, and is translated directly into JSON, so if you write your own connector you should be able to figure out what it will be in the API.  For connectors already part of ManifoldCF, it remains an ongoing task to document these connector-specific objects.  This task is not yet underway.</p>
+<p></p>
+<p>Luckily, it is pretty easy to learn a lot about the objects in question by simply creating connections and jobs in the ManifoldCF crawler UI, and then inspecting the resulting JSON objects through the API.  In this way, it should be possible to do a decent job of coding most API-based integrations.  The one place where difficulties will certainly occur will be if you try to completely replace the ManifoldCF crawler UI with one of your own.  This is because most connectors have methods that communicate with their respective back-ends in order to allow the user to select appropriate values.  For example, the path drill-down that is presented by the LiveLink connector requires that the connector interrogate the appropriate LiveLink repository in order to populate its path selection pull-downs.  There is, at this time, only one sanctioned way to accomplish the same job using the API, which is to use the appropriate "<em>connection_type</em>/execute/<em>type-specific_command</
 em>" command to perform the necessary functions.  Some set of useful functions has been coded for every appropriate connector, but the exact commands for every connector, and their JSON syntax, remains undocumented for now.</p>
+<p></p>
+<a name="N10856"></a><a name="File+system+connector"></a>
+<h4>File system connector</h4>
+<p></p>
+<p>The file system connector has no configuration information, and no connector-specific commands.  However, it does have document specification information.  The information looks something like this:</p>
+<p></p>
+<p>{"startpoint":[{"_attribute_path":"c:\path_to_files","include":[{"_attribute_type":"file","_attribute_match":"*.txt"},{"_attribute_type":"file","_attribute_match":"*.doc"\,"_attribute_type":"directory","_attribute_match":"*"],"exclude":["*.mov"]]}</p>
+<p></p>
+<p>As you can see, multiple starting paths are possible, and the inclusion and exclusion rules also can be one or multiple.</p>
+<p></p>
+<p></p>
+<a name="N10871"></a><a name="Control+via+Commands"></a>
+<h3 class="h4">Control via Commands</h3>
+<p></p>
+<p>For script writers, there currently exist a number of ManifoldCF execution commands.  These commands are primarily rich in the area of definition of connections and jobs, controlling jobs, and running reports.  The following table lists the current suite.</p>
+<p></p>
+<table class="ForrestTable" cellspacing="1" cellpadding="4">
+          
+<tr>
+<th colspan="1" rowspan="1">Command</th><th colspan="1" rowspan="1">What it does</th>
+</tr>
+          
+<tr>
+<td colspan="1" rowspan="1">org.apache.manifoldcf.agents.DefineOutputConnection</td><td colspan="1" rowspan="1">Create a new output connection</td>
+</tr>
+          
+<tr>
+<td colspan="1" rowspan="1">org.apache.manifoldcf.agents.DeleteOutputConnection</td><td colspan="1" rowspan="1">Delete an existing output connection</td>
+</tr>
+          
+<tr>
+<td colspan="1" rowspan="1">org.apache.manifoldcf.authorities.ChangeAuthSpec</td><td colspan="1" rowspan="1">Modify an authority's configuration information</td>
+</tr>
+          
+<tr>
+<td colspan="1" rowspan="1">org.apache.manifoldcf.authorities.CheckAll</td><td colspan="1" rowspan="1">Check all authorities to be sure they are functioning</td>
+</tr>
+          
+<tr>
+<td colspan="1" rowspan="1">org.apache.manifoldcf.authorities.DefineAuthorityConnection</td><td colspan="1" rowspan="1">Create a new authority connection</td>
+</tr>
+          
+<tr>
+<td colspan="1" rowspan="1">org.apache.manifoldcf.authorities.DeleteAuthorityConnection</td><td colspan="1" rowspan="1">Delete an existing authority connection</td>
+</tr>
+          
+<tr>
+<td colspan="1" rowspan="1">org.apache.manifoldcf.crawler.AbortJob</td><td colspan="1" rowspan="1">Abort a running job</td>
+</tr>
+          
+<tr>
+<td colspan="1" rowspan="1">org.apache.manifoldcf.crawler.AddScheduledTime</td><td colspan="1" rowspan="1">Add a schedule record to a job</td>
+</tr>
+          
+<tr>
+<td colspan="1" rowspan="1">org.apache.manifoldcf.crawler.ChangeJobDocSpec</td><td colspan="1" rowspan="1">Modify a job's specification information</td>
+</tr>
+          
+<tr>
+<td colspan="1" rowspan="1">org.apache.manifoldcf.crawler.DefineJob</td><td colspan="1" rowspan="1">Create a new job</td>
+</tr>
+          
+<tr>
+<td colspan="1" rowspan="1">org.apache.manifoldcf.crawler.DefineRepositoryConnection</td><td colspan="1" rowspan="1">Create a new repository connection</td>
+</tr>
+          
+<tr>
+<td colspan="1" rowspan="1">org.apache.manifoldcf.crawler.DeleteJob</td><td colspan="1" rowspan="1">Delete an existing job</td>
+</tr>
+          
+<tr>
+<td colspan="1" rowspan="1">org.apache.manifoldcf.crawler.DeleteRepositoryConnection</td><td colspan="1" rowspan="1">Delete an existing repository connection</td>
+</tr>
+          
+<tr>
+<td colspan="1" rowspan="1">org.apache.manifoldcf.crawler.ExportConfiguration</td><td colspan="1" rowspan="1">Write the complete list of all connection definitions and job specifications to a file</td>
+</tr>
+          
+<tr>
+<td colspan="1" rowspan="1">org.apache.manifoldcf.crawler.FindJob</td><td colspan="1" rowspan="1">Locate a job identifier given a job's name</td>
+</tr>
+          
+<tr>
+<td colspan="1" rowspan="1">org.apache.manifoldcf.crawler.GetJobSchedule</td><td colspan="1" rowspan="1">Find a job's schedule given a job's identifier</td>
+</tr>
+          
+<tr>
+<td colspan="1" rowspan="1">org.apache.manifoldcf.crawler.ImportConfiguration</td><td colspan="1" rowspan="1">Import configuration as written by a previous ExportConfiguration command</td>
+</tr>
+          
+<tr>
+<td colspan="1" rowspan="1">org.apache.manifoldcf.crawler.ListJobStatuses</td><td colspan="1" rowspan="1">List the status of all jobs</td>
+</tr>
+          
+<tr>
+<td colspan="1" rowspan="1">org.apache.manifoldcf.crawler.ListJobs</td><td colspan="1" rowspan="1">List the identifiers for all jobs</td>
+</tr>
+          
+<tr>
+<td colspan="1" rowspan="1">org.apache.manifoldcf.crawler.PauseJob</td><td colspan="1" rowspan="1">Given a job identifier, pause the specified job</td>
+</tr>
+          
+<tr>
+<td colspan="1" rowspan="1">org.apache.manifoldcf.crawler.RestartJob</td><td colspan="1" rowspan="1">Given a job identifier, restart the specified job</td>
+</tr>
+          
+<tr>
+<td colspan="1" rowspan="1">org.apache.manifoldcf.crawler.RunDocumentStatus</td><td colspan="1" rowspan="1">Run a document status report</td>
+</tr>
+          
+<tr>
+<td colspan="1" rowspan="1">org.apache.manifoldcf.crawler.RunMaxActivityHistory</td><td colspan="1" rowspan="1">Run a maximum activity report</td>
+</tr>
+          
+<tr>
+<td colspan="1" rowspan="1">org.apache.manifoldcf.crawler.RunMaxBandwidthHistory</td><td colspan="1" rowspan="1">Run a maximum bandwidth report</td>
+</tr>
+          
+<tr>
+<td colspan="1" rowspan="1">org.apache.manifoldcf.crawler.RunQueueStatus</td><td colspan="1" rowspan="1">Run a queue status report</td>
+</tr>
+          
+<tr>
+<td colspan="1" rowspan="1">org.apache.manifoldcf.crawler.RunResultHistory</td><td colspan="1" rowspan="1">Run a result history report</td>
+</tr>
+          
+<tr>
+<td colspan="1" rowspan="1">org.apache.manifoldcf.crawler.RunSimpleHistory</td><td colspan="1" rowspan="1">Run a simply history report</td>
+</tr>
+          
+<tr>
+<td colspan="1" rowspan="1">org.apache.manifoldcf.crawler.StartJob</td><td colspan="1" rowspan="1">Start a job</td>
+</tr>
+          
+<tr>
+<td colspan="1" rowspan="1">org.apache.manifoldcf.crawler.WaitForJobDeleted</td><td colspan="1" rowspan="1">After a job has been deleted, wait until the delete has completed</td>
+</tr>
+          
+<tr>
+<td colspan="1" rowspan="1">org.apache.manifoldcf.crawler.WaitForJobInactive</td><td colspan="1" rowspan="1">After a job has been started or aborted, wait until the job ceases all activity</td>
+</tr>
+          
+<tr>
+<td colspan="1" rowspan="1">org.apache.manifoldcf.crawler.WaitJobPaused</td><td colspan="1" rowspan="1">After a job has been paused, wait for the pause to take effect</td>
+</tr>
+        
+</table>
+<p></p>
+<a name="N109C4"></a><a name="Control+by+direct+code"></a>
+<h3 class="h4">Control by direct code</h3>
+<p></p>
+<p>Control by direct java code is quite a reasonable thing to do.  The sources of the above commands should give a pretty clear idea how to proceed, if that's what you want to do.</p>
+<p></p>
+<p></p>
+<a name="N109D4"></a><a name="Caveats"></a>
+<h3 class="h4">Caveats</h3>
+<p></p>
+<p>The above commands know nothing about the differences between connection types.  Instead, they deal with configuration and specification information in the form of XML documents.  Normally, these XML documents are hidden from a system integrator, unless they happen to look into the database with a tool such as psql.  But the API commands above often will require such XML documents to be included as part of the command execution.</p>
+<p></p>
+<p>This has one major consequence.  Any application that would manipulate connections and jobs directly cannot be connection-type independent - these applications must know the proper form of XML to submit to the command.  So, it is not possible to use these command APIs to write one's own UI wrapper, without sacrificing some of the repository independence that ManifoldCF by itself maintains.</p>
+</div>
+  
+</div>
+<!--+
+    |end content
+    +-->
+<div class="clearboth">&nbsp;</div>
+</div>
+<div id="footer">
+<!--+
+    |start bottomstrip
+    +-->
+<div class="lastmodified">
+<script type="text/javascript"><!--
+document.write("Last Published: " + document.lastModified);
+//  --></script>
+</div>
+<div class="copyright">
+        Copyright &copy;
+         2009, 2010 <a href="http://www.apache.org/licenses/">The Apache Software Foundation.</a>
+</div>
+<!--+
+    |end bottomstrip
+    +-->
+</div>
+</body>
+</html>

Added: incubator/lcf/site/publish/programmatic-operation.pdf
URL: http://svn.apache.org/viewvc/incubator/lcf/site/publish/programmatic-operation.pdf?rev=1034600&view=auto
==============================================================================
Binary file - no diff available.

Propchange: incubator/lcf/site/publish/programmatic-operation.pdf
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Modified: incubator/lcf/site/publish/who.pdf
URL: http://svn.apache.org/viewvc/incubator/lcf/site/publish/who.pdf?rev=1034600&r1=1034599&r2=1034600&view=diff
==============================================================================
Binary files - no diff available.

Modified: incubator/lcf/site/src/documentation/content/xdocs/developer-resources.xml
URL: http://svn.apache.org/viewvc/incubator/lcf/site/src/documentation/content/xdocs/developer-resources.xml?rev=1034600&r1=1034599&r2=1034600&view=diff
==============================================================================
--- incubator/lcf/site/src/documentation/content/xdocs/developer-resources.xml (original)
+++ incubator/lcf/site/src/documentation/content/xdocs/developer-resources.xml Fri Nov 12 23:32:51 2010
@@ -28,7 +28,7 @@
 
     <section id="howtointegrate">
           <title>How to Integrate</title>
-          <p>ManifoldCF provides a number of API's and services.  Documentation of these API's can be found <a href="http://cwiki.apache.org/confluence/display/CONNECTORS/Programmatic+Operation+of+ManifoldCF">here</a>.
+          <p>ManifoldCF provides a number of API's and services.  Documentation of these API's can be found <a href="programmatic-operation.html">here</a>.
           </p>
     </section>
     

Added: incubator/lcf/site/src/documentation/content/xdocs/programmatic-operation.xml
URL: http://svn.apache.org/viewvc/incubator/lcf/site/src/documentation/content/xdocs/programmatic-operation.xml?rev=1034600&view=auto
==============================================================================
--- incubator/lcf/site/src/documentation/content/xdocs/programmatic-operation.xml (added)
+++ incubator/lcf/site/src/documentation/content/xdocs/programmatic-operation.xml Fri Nov 12 23:32:51 2010
@@ -0,0 +1,334 @@
+<?xml version="1.0"?>
+
+<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V2.0//EN" 
+          "http://forrest.apache.org/dtd/document-v20.dtd">
+
+<document> 
+
+  <header> 
+    <title>Programmatic Operation</title> 
+  </header> 
+
+  <body> 
+    <section>
+      <title>Programmatic Operation</title>
+      <p></p>
+      <p>A certain subset of ManifoldCF users want to think of ManifoldCF as an engine that they can poke from whatever other system they are developing.  While ManifoldCF is not precisely a document indexing engine per se, it can certainly be controlled programmatically.  Right now, there are three principle ways of achieving this control.</p>
+      <p></p>
+      <section>
+        <title>Control by Servlet API</title>
+        <p></p>
+        <p>ManifoldCF provides a servlet-based JSON API that gives you the complete ability to define connections and jobs, and control job execution.  You can read about JSON [here|http://www.json.org].  The API is designed to be RESTful in character.  Thus, it makes full use of the HTTP verbs GET, PUT, POST, and DELETE, and represents objects as URLs.  The basic format of the JSON servlet resource URLs is as follows:</p>
+        <p></p>
+        <p>http[s]://<em>&lt;server_and_port&gt;</em>/mcf-api-service/json/<em>&lt;resource&gt;</em></p>
+        <p></p>
+        <p>The servlet ignores request data, except when the PUT or POST verb is used.  In that case, the request data is presumed to be a JSON object.  The servlet responds either with an error response code (either 400 or 500) with an appropriate explanatory message, or with a 200 (OK), 201 (CREATED), or 404 (NOT FOUND) response code along with a response JSON object.</p>
+        <p></p>
+        <p>The actual available resources and commands are as follows:</p>
+        <p></p>
+        <p></p>
+        <p></p>
+        <table>
+          <tr><th>Resource</th><th>Verb</th><th>What it does</th><th>Input format</th><th>Output format</th></tr>
+          <tr><td>outputconnectors</td><td>GET</td><td>List all registered output connectors</td><td>N/A</td><td>{"outputconnector":[<em>&lt;list_of_output_connector_objects&gt;</em>]} <strong>OR</strong> {"error":<em>&lt;error_text&gt;</em>}</td></tr>
+          <tr><td>authorityconnectors</td><td>GET</td><td>List all registered authority connectors</td><td>N/A</td><td>{"authorityconnector":[<em>&lt;list_of_authority_connector_objects&gt;</em>]} <strong>OR</strong> {"error":<em>&lt;error_text&gt;</em>}</td></tr>
+          <tr><td>repositoryconnectors</td><td>GET</td><td>List all registered repository connectors</td><td>N/A</td><td>{"repositoryconnector":[<em>&lt;list_of_repository_connector_objects&gt;</em>]} <strong>OR</strong> {"error":<em>&lt;error_text&gt;</em>}</td></tr>
+          <tr><td>outputconnections</td><td>GET</td><td>List all output connections</td><td>N/A</td><td>{"outputconnection":[<em>&lt;list_of_output_connection_objects&gt;</em>]} <strong>OR</strong> {"error":<em>&lt;error_text&gt;</em>}</td></tr>
+          <tr><td>outputconnections/<em>&lt;encoded_connection_name&gt;</em></td><td>GET</td><td>Get a specific output connection</td><td>N/A</td><td>{"outputconnection":<em>&lt;output_connection_object&gt;</em>} <strong>OR</strong> { } <strong>OR</strong> {"error":<em>&lt;error_text&gt;</em>}</td></tr>
+          <tr><td>outputconnections/<em>&lt;encoded_connection_name&gt;</em></td><td>PUT</td><td>Save or create an output connection</td><td>{"outputconnection":<em>&lt;output_connection_object&gt;</em>}</td><td>{"connection_name":<em>&lt;connection_name&gt;</em>} <strong>OR</strong> {"error":<em>&lt;error_text&gt;</em>}</td></tr>
+          <tr><td>outputconnections/<em>&lt;encoded_connection_name&gt;</em></td><td>DELETE</td><td>Delete an output connection</td><td>N/A</td><td>{ } <strong>OR</strong> {"error":<em>&lt;error_text&gt;</em>}</td></tr>
+          <tr><td>status/outputconnections/<em>&lt;encoded_connection_name&gt;</em></td><td>GET</td><td>Check the status of an output connection</td><td>N/A</td><td>{"check_result":<em>&lt;message&gt;</em>} <strong>OR</strong> {"error":<em>&lt;error_text&gt;</em>}</td></tr>
+          <tr><td>info/outputconnections/<em>&lt;encoded_connection_name&gt;</em>/<em>&lt;connector_specific_resource&gt;</em></td><td>GET</td><td>Retrieve arbitrary connector-specific resource</td><td>N/A</td><td><em>&lt;response_data&gt;</em> <strong>OR</strong> {"error":<em>&lt;error_text&gt;</em>} <strong>OR</strong> {"service_interruption":<em>&lt;error_text&gt;</em>}</td></tr>
+          <tr><td>authorityconnections</td><td>GET</td><td>List all authority connections</td><td>N/A</td><td>{"authorityconnection":[<em>&lt;list_of_authority_connection_objects&gt;</em>]} <strong>OR</strong> {"error":<em>&lt;error_text&gt;</em>}</td></tr>
+          <tr><td>authorityconnections/<em>&lt;encoded_connection_name&gt;</em></td><td>GET</td><td>Get a specific authority connection</td><td>N/A</td><td>{"authorityconnection":<em>&lt;authority_connection_object&gt;</em>} <strong>OR</strong> { } <strong>OR</strong> {"error":<em>&lt;error_text&gt;</em>}</td></tr>
+          <tr><td>authorityconnections/<em>&lt;encoded_connection_name&gt;</em></td><td>PUT</td><td>Save or create an authority connection</td><td>{"authorityconnection":<em>&lt;authority_connection_object&gt;</em>}</td><td>{"connection_name":<em>&lt;connection_name&gt;</em>} <strong>OR</strong> {"error":<em>&lt;error_text&gt;</em>}</td></tr>
+          <tr><td>authorityconnections/<em>&lt;encoded_connection_name&gt;</em></td><td>DELETE</td><td>Delete an authority connection</td><td>N/A</td><td>{ } <strong>OR</strong> {"error":<em>&lt;error_text&gt;</em>}</td></tr>
+          <tr><td>status/authorityconnections/<em>&lt;encoded_connection_name&gt;</em></td><td>GET</td><td>Check the status of an authority connection</td><td>N/A</td><td>{"check_result":<em>&lt;message&gt;</em>} <strong>OR</strong> {"error":<em>&lt;error_text&gt;</em>}</td></tr>
+          <tr><td>repositoryconnections</td><td>GET</td><td>List all repository connections</td><td>N/A</td><td>{"repositoryconnection":[<em>&lt;list_of_repository_connection_objects&gt;</em>]} <strong>OR</strong> {"error":<em>&lt;error_text&gt;</em>}</td></tr>
+          <tr><td>repositoryconnections/<em>&lt;encoded_connection_name&gt;</em></td><td>GET</td><td>Get a specific repository connection</td><td>N/A</td><td>{"repositoryconnection":<em>&lt;repository_connection_object&gt;</em>} <strong>OR</strong> { } <strong>OR</strong> {"error":<em>&lt;error_text&gt;</em>}</td></tr>
+          <tr><td>repositoryconnections/<em>&lt;encoded_connection_name&gt;</em></td><td>PUT</td><td>Save or create a repository connection</td><td>{"repositoryconnection":<em>&lt;repository_connection_object&gt;</em>}</td><td>{"connection_name":<em>&lt;connection_name&gt;</em>} <strong>OR</strong> {"error":<em>&lt;error_text&gt;</em>}</td></tr>
+          <tr><td>repositoryconnections/<em>&lt;encoded_connection_name&gt;</em></td><td>DELETE</td><td>Delete a repository connection</td><td>N/A</td><td>{ } <strong>OR</strong> {"error":<em>&lt;error_text&gt;</em>}</td></tr>
+          <tr><td>status/repositoryconnections/<em>&lt;encoded_connection_name&gt;</em></td><td>GET</td><td>Check the status of a repository connection</td><td>N/A</td><td>{"check_result":<em>&lt;message&gt;</em>} <strong>OR</strong> {"error":<em>&lt;error_text&gt;</em>}</td></tr>
+          <tr><td>info/repositoryconnections/<em>&lt;encoded_connection_name&gt;</em>/<em>&lt;connector_specific_resource&gt;</em></td><td>GET</td><td>Retrieve arbitrary connector-specific resource</td><td>N/A</td><td><em>&lt;response_data&gt;</em> <strong>OR</strong> {"error":<em>&lt;error_text&gt;</em>} <strong>OR</strong> {"service_interruption":<em>&lt;error_text&gt;</em>}</td></tr>
+          <tr><td>jobs</td><td>GET</td><td>List all job definitions</td><td>N/A</td><td>{"job":[<em>&lt;list_of_job_objects&gt;</em>]} <strong>OR</strong> {"error":<em>&lt;error_text&gt;</em>}</td></tr>
+          <tr><td>jobs</td><td>POST</td><td>Create a job</td><td>{"job":<em>&lt;job_object&gt;</em>}</td><td>{"job_id":<em>&lt;job_identifier&gt;</em>} <strong>OR</strong> {"error":<em>&lt;error_text&gt;</em>}</td></tr>
+          <tr><td>jobs/<em>&lt;job_id&gt;</em></td><td>GET</td><td>Get a specific job definition</td><td>N/A</td><td>{"job":<em>&lt;job_object_&gt;</em>} <strong>OR</strong> { } <strong>OR</strong> {"error":<em>&lt;error_text&gt;</em>}</td></tr>
+          <tr><td>jobs/<em>&lt;job_id&gt;</em></td><td>PUT</td><td>Save a job definition</td><td>{"job":<em>&lt;job_object&gt;</em>}</td><td>{"job_id":<em>&lt;job_identifier&gt;</em>} <strong>OR</strong> {"error":<em>&lt;error_text&gt;</em>}</td></tr>
+          <tr><td>jobs/<em>&lt;job_id&gt;</em></td><td>DELETE</td><td>Delete a job definition</td><td>N/A</td><td>{ } <strong>OR</strong> {"error":<em>&lt;error_text&gt;</em>}</td></tr>
+          <tr><td>jobstatuses</td><td>GET</td><td>List all jobs and their status</td><td>N/A</td><td>{"job":[<em>&lt;list_of_job_status_objects&gt;</em>]} <strong>OR</strong> {"error":<em>&lt;error_text&gt;</em>}</td></tr>
+          <tr><td>jobstatuses/<em>&lt;job_id&gt;</em></td><td>GET</td><td>Get a specific job's status</td><td>N/A</td><td>{"jobstatus":<em>&lt;job_status_object&gt;</em>} <strong>OR</strong> { } <strong>OR</strong> {"error":<em>&lt;error_text&gt;</em>} </td></tr>
+          <tr><td>start/<em>&lt;job_id&gt;</em></td><td>PUT</td><td>Start a specified job manually</td><td>N/A</td><td>{ } <strong>OR</strong> {"error":<em>&lt;error_text&gt;</em>}</td></tr>
+          <tr><td>abort/<em>&lt;job_id&gt;</em></td><td>PUT</td><td>Abort a specified job</td><td>N/A</td><td>{ } <strong>OR</strong> {"error":<em>&lt;error_text&gt;</em>}</td></tr>
+          <tr><td>restart/<em>&lt;job_id&gt;</em></td><td>PUT</td><td>Stop and start a specified job</td><td>N/A</td><td>{ } <strong>OR</strong> {"error":<em>&lt;error_text&gt;</em>}</td></tr>
+          <tr><td>pause/<em>&lt;job_id&gt;</em></td><td>PUT</td><td>Pause a specified job</td><td>N/A</td><td>{ } <strong>OR</strong> {"error":<em>&lt;error_text&gt;</em>}</td></tr>
+          <tr><td>resume/<em>&lt;job_id&gt;</em></td><td>PUT</td><td>Resume a specified job</td><td>N/A</td><td>{ } <strong>OR</strong> {"error":<em>&lt;error_text&gt;</em>}</td></tr>
+        </table>
+        <p></p>
+        <p>Other resources having to do with reports have been planned, but not yet been implemented.</p>
+        <p></p>
+        <section>
+          <title>Output connector objects</title>
+          <p></p>
+          <p>The JSON fields an output connector object has are as follows:</p>
+          <p></p>
+          <table>
+            <tr><th>Field</th><th>Meaning</th></tr>
+            <tr><td>"description"</td><td>The optional description of the connector</td></tr>
+            <tr><td>"class_name"</td><td>The class name of the class implementing the connector</td></tr>
+          </table>
+          <p></p>
+        </section>
+        <section>
+          <title>Authority connector objects</title>
+          <p></p>
+          <p>The JSON fields an authority connector object has are as follows:</p>
+          <p></p>
+          <table>
+            <tr><th>Field</th><th>Meaning</th></tr>
+            <tr><td>"description"</td><td>The optional description of the connector</td></tr>
+            <tr><td>"class_name"</td><td>The class name of the class implementing the connector</td></tr>
+          </table>
+          <p></p>
+        </section>
+        <section>
+          <title>Repository connector objects</title>
+          <p></p>
+          <p>The JSON fields a repository connector object has are as follows:</p>
+          <p></p>
+          <table>
+            <tr><th>Field</th><th>Meaning</th></tr>
+            <tr><td>"description"</td><td>The optional description of the connector</td></tr>
+            <tr><td>"class_name"</td><td>The class name of the class implementing the connector</td></tr>
+          </table>
+          <p></p>
+        </section>
+        <section>
+          <title>Output connection objects</title>
+          <p></p>
+          <p>Output connection names, when they are part of a URL, should be encoded as follows:</p>
+          <p></p>
+          <ol>
+            <li>All instances of '.' should be replaced by '..'.</li>
+            <li>All instances of '/' should be replaced by '.+'.</li>
+            <li>The URL should be encoded using standard URL utf-8-based %-encoding.</li>
+          </ol>
+          <p></p>
+          <p>The JSON fields an output connection object has are as follows:</p>
+          <p></p>
+          <table>
+            <tr><th>Field</th><th>Meaning</th></tr>
+            <tr><td>"name"</td><td>The unique name of the connection</td></tr>
+            <tr><td>"description"</td><td>The description of the connection</td></tr>
+            <tr><td>"class_name"</td><td>The java class name of the class implementing the connection</td></tr>
+            <tr><td>"max_connections"</td><td>The total number of outstanding connections allowed to exist at a time</td></tr>
+            <tr><td>"configuration"</td><td>The configuration object for the connection, which is specific to the connection class</td></tr>
+          </table>
+          <p></p>
+        </section>
+        <section>
+          <title>Authority connection objects</title>
+          <p></p>
+          <p>Authority connection names, when they are part of a URL, should be encoded as follows:</p>
+          <p></p>
+          <ol>
+            <li>All instances of '.' should be replaced by '..'.</li>
+            <li>All instances of '/' should be replaced by '.+'.</li>
+            <li>The URL should be encoded using standard URL utf-8-based %-encoding.</li>
+          </ol>
+          <p></p>
+          <p>The JSON fields for an authority connection object are as follows:</p>
+          <p></p>
+          <table>
+            <tr><th>Field</th><th>Meaning</th></tr>
+            <tr><td>"name"</td><td>The unique name of the connection</td></tr>
+            <tr><td>"description"</td><td>The description of the connection</td></tr>
+            <tr><td>"class_name"</td><td>The java class name of the class implementing the connection</td></tr>
+            <tr><td>"max_connections"</td><td>The total number of outstanding connections allowed to exist at a time</td></tr>
+            <tr><td>"configuration"</td><td>The configuration object for the connection, which is specific to the connection class</td></tr>
+          </table>
+          <p></p>
+        </section>
+        <section>
+          <title>Repository connection objects</title>
+          <p></p>
+          <p>Repository connection names, when they are part of a URL, should be encoded as follows:</p>
+          <p></p>
+          <ol>
+            <li>All instances of '.' should be replaced by '..'.</li>
+            <li>All instances of '/' should be replaced by '.+'.</li>
+            <li>The URL should be encoded using standard URL utf-8-based %-encoding.</li>
+          </ol>
+          <p></p>
+          <p>The JSON fields for a repository connection object are as follows:</p>
+          <p></p>
+          <table>
+            <tr><th>Field</th><th>Meaning</th></tr>
+            <tr><td>"name"</td><td>The unique name of the connection</td></tr>
+            <tr><td>"description"</td><td>The description of the connection</td></tr>
+            <tr><td>"class_name"</td><td>The java class name of the class implementing the connection</td></tr>
+            <tr><td>"max_connections"</td><td>The total number of outstanding connections allowed to exist at a time</td></tr>
+            <tr><td>"configuration"</td><td>The configuration object for the connection, which is specific to the connection class</td></tr>
+            <tr><td>"acl_authority"</td><td>The (optional) name of the authority that will enforce security for this connection</td></tr>
+            <tr><td>"throttle"</td><td>An array of throttle objects, which control how quickly documents can be requested from this connection</td></tr>
+          </table>
+          <p></p>
+          <p>Each throttle object has the following fields:</p>
+          <p></p>
+          <table>
+            <tr><th>Field</th><th>Meaning</th></tr>
+            <tr><td>"match"</td><td>The regular expression which is used to match a document's bins to determine if the throttle should be applied</td></tr>
+            <tr><td>"match_description"</td><td>Optional text describing the meaning of the throttle</td></tr>
+            <tr><td>"rate"</td><td>The maximum fetch rate to use if the throttle applies, in fetches per minute</td></tr>
+          </table>
+          <p></p>
+        </section>
+        <section>
+          <title>Job objects</title>
+          <p></p>
+          <p>The JSON fields for a job are is as follows:</p>
+          <p></p>
+          <table>
+            <tr><th>Field</th><th>Meaning</th></tr>
+            <tr><td>"id"</td><td>The job's identifier, if present.  If not present, ManifoldCF will create one (and will also create the job when saved).</td></tr>
+            <tr><td>"description"</td><td>Text describing the job</td></tr>
+            <tr><td>"repository_connection"</td><td>The name of the repository connection to use with the job</td></tr>
+            <tr><td>"output_connection"</td><td>The name of the output connection to use with the job</td></tr>
+            <tr><td>"document_specification"</td><td>The document specification object for the job, whose format is repository-connection specific</td></tr>
+            <tr><td>"output_specification"</td><td>The output specification object for the job, whose format is output-connection specific</td></tr>
+            <tr><td>"start_mode"</td><td>The start mode for the job, which can be one of "schedule window start", "schedule window anytime", or "manual"</td></tr>
+            <tr><td>"run_mode"</td><td>The run mode for the job, which can be either "continuous" or "scan once"</td></tr>
+            <tr><td>"hopcount_mode"</td><td>The hopcount mode for the job, which can be either "accurate", "no delete", "never delete"</td></tr>
+            <tr><td>"priority"</td><td>The job's priority, typically "5"</td></tr>
+            <tr><td>"recrawl_interval"</td><td>The default time between recrawl of documents (if the job is "continuous"), in milliseconds, or "infinite" for infinity</td></tr>
+            <tr><td>"expiration_interval"</td><td>The time until a document expires (if the job is "continuous"), in milliseconds, or "infinite" for infinity</td></tr>
+            <tr><td>"reseed_interval"</td><td>The time between reseeding operations (if the job is "continuous"), in milliseconds, or "infinite" for infinity</td></tr>
+            <tr><td>"hopcount"</td><td>An array of hopcount objects, describing the link types and associated maximum hops permitted for the job</td></tr>
+            <tr><td>"schedule"</td><td>An array of schedule objects, describing when the job should be started and run</td></tr>
+          </table>
+          <p></p>
+          <p>Each hopcount object has the following fields:</p>
+          <p></p>
+          <table>
+            <tr><th>Field</th><th>Meaning</th></tr>
+            <tr><td>"link_type"</td><td>The connection-type-dependent type of a link for which a hop count restriction is specified</td></tr>
+            <tr><td>"count"</td><td>The maximum number of hops allowed for the associated link type, starting at a seed</td></tr>
+          </table>
+          <p></p>
+          <p>Each schedule object has the following fields:</p>
+          <p></p>
+          <table>
+            <tr><th>Field</th><th>Meaning</th></tr>
+            <tr><td>"timezone"</td><td>The optional time zone for the schedule object; if not present the default server time zone is used</td></tr>
+            <tr><td>"duration"</td><td>The optional length of the described time window, in milliseconds; if not present, duration is considered infinite</td></tr>
+            <tr><td>"dayofweek"</td><td>The optional day-of-the-week enumeration object</td></tr>
+            <tr><td>"monthofyear"</td><td>The optional month-of-the-year enumeration object</td></tr>
+            <tr><td>"dayofmonth"</td><td>The optional day-of-the-month enumeration object</td></tr>
+            <tr><td>"year"</td><td>The optional year enumeration object</td></tr>
+            <tr><td>"hourofday"</td><td>The optional hour-of-the-day enumeration object</td></tr>
+            <tr><td>"minutesofhour"</td><td>The optional minutes-of-the-hour enumeration object</td></tr>
+          </table>
+          <p></p>
+          <p>Each enumeration object describes an array of integers using the form:</p>
+          <p></p>
+          <p>{"value":[<em>&lt;integer_list&gt;</em>]}</p>
+          <p></p>
+          <p>Each integer is a zero-based index describing which entity is being specified.  For example, for "dayofweek", 0 corresponds to Sunday, etc., and thus "dayofweek":{"value":[0,6]} would describe Saturdays and Sundays.</p>
+          <p></p>
+        </section>
+        <section>
+          <title>Job status objects</title>
+          <p></p>
+          <p>The JSON fields of a job status object are as follows:</p>
+          <p></p>
+          <table>
+            <tr><th>Field</th><th>Meaning</th></tr>
+            <tr><td>"job_id"</td><td>The job identifier</td></tr>
+            <tr><td>"status"</td><td>The job status, having the possible values: "not yet run", "running", "paused", "done", "waiting", "starting up", "cleaning up", "error", "aborting", "restarting", "running no connector", and "terminating"</td></tr>
+            <tr><td>"error_text"</td><td>The error text, if the status is "error"</td></tr>
+            <tr><td>"start_time"</td><td>The job start time, in milliseconds since Jan 1, 1970</td></tr>
+            <tr><td>"end_time"</td><td>The job end time, in milliseconds since Jan 1, 1970</td></tr>
+            <tr><td>"documents_in_queue"</td><td>The total number of documents in the queue for the job</td></tr>
+            <tr><td>"documents_outstanding"</td><td>The number of documents for the job that are currently considered 'active'</td></tr>
+            <tr><td>"documents_processed"</td><td>The number of documents that in the queue for the job that have been processed at least once</td></tr>
+          </table>
+          <p></p>
+        </section>
+        <section>
+          <title>Connection-type-specific objects</title>
+          <p></p>
+          <p>As you may note when trying to use the above JSON API methods, you cannot get very far in defining connections or jobs without knowing the JSON format of a connection's configuration information, or a job's connection-specific document specification and output specification information.  The form of these objects is controlled by the Java implementation of the underlying connector, and is translated directly into JSON, so if you write your own connector you should be able to figure out what it will be in the API.  For connectors already part of ManifoldCF, it remains an ongoing task to document these connector-specific objects.  This task is not yet underway.</p>
+          <p></p>
+          <p>Luckily, it is pretty easy to learn a lot about the objects in question by simply creating connections and jobs in the ManifoldCF crawler UI, and then inspecting the resulting JSON objects through the API.  In this way, it should be possible to do a decent job of coding most API-based integrations.  The one place where difficulties will certainly occur will be if you try to completely replace the ManifoldCF crawler UI with one of your own.  This is because most connectors have methods that communicate with their respective back-ends in order to allow the user to select appropriate values.  For example, the path drill-down that is presented by the LiveLink connector requires that the connector interrogate the appropriate LiveLink repository in order to populate its path selection pull-downs.  There is, at this time, only one sanctioned way to accomplish the same job using the API, which is to use the appropriate "<em>connection_type</em>/execute/<em>type-specific
 _command</em>" command to perform the necessary functions.  Some set of useful functions has been coded for every appropriate connector, but the exact commands for every connector, and their JSON syntax, remains undocumented for now.</p>
+          <p></p>
+        </section>
+        <section>
+          <title>File system connector</title>
+          <p></p>
+          <p>The file system connector has no configuration information, and no connector-specific commands.  However, it does have document specification information.  The information looks something like this:</p>
+          <p></p>
+          <p>{"startpoint":[{"_attribute_path":"c:\path_to_files","include":[{"_attribute_type":"file","_attribute_match":"*.txt"},{"_attribute_type":"file","_attribute_match":"*.doc"\,"_attribute_type":"directory","_attribute_match":"*"],"exclude":["*.mov"]]}</p>
+          <p></p>
+          <p>As you can see, multiple starting paths are possible, and the inclusion and exclusion rules also can be one or multiple.</p>
+          <p></p>
+          <p></p>
+        </section>
+      </section>
+      <section>
+        <title>Control via Commands</title>
+        <p></p>
+        <p>For script writers, there currently exist a number of ManifoldCF execution commands.  These commands are primarily rich in the area of definition of connections and jobs, controlling jobs, and running reports.  The following table lists the current suite.</p>
+        <p></p>
+        <table>
+          <tr><th>Command</th><th>What it does</th></tr>
+          <tr><td>org.apache.manifoldcf.agents.DefineOutputConnection</td><td>Create a new output connection</td></tr>
+          <tr><td>org.apache.manifoldcf.agents.DeleteOutputConnection</td><td>Delete an existing output connection</td></tr>
+          <tr><td>org.apache.manifoldcf.authorities.ChangeAuthSpec</td><td>Modify an authority's configuration information</td></tr>
+          <tr><td>org.apache.manifoldcf.authorities.CheckAll</td><td>Check all authorities to be sure they are functioning</td></tr>
+          <tr><td>org.apache.manifoldcf.authorities.DefineAuthorityConnection</td><td>Create a new authority connection</td></tr>
+          <tr><td>org.apache.manifoldcf.authorities.DeleteAuthorityConnection</td><td>Delete an existing authority connection</td></tr>
+          <tr><td>org.apache.manifoldcf.crawler.AbortJob</td><td>Abort a running job</td></tr>
+          <tr><td>org.apache.manifoldcf.crawler.AddScheduledTime</td><td>Add a schedule record to a job</td></tr>
+          <tr><td>org.apache.manifoldcf.crawler.ChangeJobDocSpec</td><td>Modify a job's specification information</td></tr>
+          <tr><td>org.apache.manifoldcf.crawler.DefineJob</td><td>Create a new job</td></tr>
+          <tr><td>org.apache.manifoldcf.crawler.DefineRepositoryConnection</td><td>Create a new repository connection</td></tr>
+          <tr><td>org.apache.manifoldcf.crawler.DeleteJob</td><td>Delete an existing job</td></tr>
+          <tr><td>org.apache.manifoldcf.crawler.DeleteRepositoryConnection</td><td>Delete an existing repository connection</td></tr>
+          <tr><td>org.apache.manifoldcf.crawler.ExportConfiguration</td><td>Write the complete list of all connection definitions and job specifications to a file</td></tr>
+          <tr><td>org.apache.manifoldcf.crawler.FindJob</td><td>Locate a job identifier given a job's name</td></tr>
+          <tr><td>org.apache.manifoldcf.crawler.GetJobSchedule</td><td>Find a job's schedule given a job's identifier</td></tr>
+          <tr><td>org.apache.manifoldcf.crawler.ImportConfiguration</td><td>Import configuration as written by a previous ExportConfiguration command</td></tr>
+          <tr><td>org.apache.manifoldcf.crawler.ListJobStatuses</td><td>List the status of all jobs</td></tr>
+          <tr><td>org.apache.manifoldcf.crawler.ListJobs</td><td>List the identifiers for all jobs</td></tr>
+          <tr><td>org.apache.manifoldcf.crawler.PauseJob</td><td>Given a job identifier, pause the specified job</td></tr>
+          <tr><td>org.apache.manifoldcf.crawler.RestartJob</td><td>Given a job identifier, restart the specified job</td></tr>
+          <tr><td>org.apache.manifoldcf.crawler.RunDocumentStatus</td><td>Run a document status report</td></tr>
+          <tr><td>org.apache.manifoldcf.crawler.RunMaxActivityHistory</td><td>Run a maximum activity report</td></tr>
+          <tr><td>org.apache.manifoldcf.crawler.RunMaxBandwidthHistory</td><td>Run a maximum bandwidth report</td></tr>
+          <tr><td>org.apache.manifoldcf.crawler.RunQueueStatus</td><td>Run a queue status report</td></tr>
+          <tr><td>org.apache.manifoldcf.crawler.RunResultHistory</td><td>Run a result history report</td></tr>
+          <tr><td>org.apache.manifoldcf.crawler.RunSimpleHistory</td><td>Run a simply history report</td></tr>
+          <tr><td>org.apache.manifoldcf.crawler.StartJob</td><td>Start a job</td></tr>
+          <tr><td>org.apache.manifoldcf.crawler.WaitForJobDeleted</td><td>After a job has been deleted, wait until the delete has completed</td></tr>
+          <tr><td>org.apache.manifoldcf.crawler.WaitForJobInactive</td><td>After a job has been started or aborted, wait until the job ceases all activity</td></tr>
+          <tr><td>org.apache.manifoldcf.crawler.WaitJobPaused</td><td>After a job has been paused, wait for the pause to take effect</td></tr>
+        </table>
+        <p></p>
+      </section>
+      <section>
+        <title>Control by direct code</title>
+        <p></p>
+        <p>Control by direct java code is quite a reasonable thing to do.  The sources of the above commands should give a pretty clear idea how to proceed, if that's what you want to do.</p>
+        <p></p>
+        <p></p>
+      </section>
+      <section>
+        <title>Caveats</title>
+        <p></p>
+        <p>The above commands know nothing about the differences between connection types.  Instead, they deal with configuration and specification information in the form of XML documents.  Normally, these XML documents are hidden from a system integrator, unless they happen to look into the database with a tool such as psql.  But the API commands above often will require such XML documents to be included as part of the command execution.</p>
+        <p></p>
+        <p>This has one major consequence.  Any application that would manipulate connections and jobs directly cannot be connection-type independent - these applications must know the proper form of XML to submit to the command.  So, it is not possible to use these command APIs to write one's own UI wrapper, without sacrificing some of the repository independence that ManifoldCF by itself maintains.</p>
+      </section>
+    </section>
+  </body>
+
+</document>

Propchange: incubator/lcf/site/src/documentation/content/xdocs/programmatic-operation.xml
------------------------------------------------------------------------------
    svn:eol-style = native

Propchange: incubator/lcf/site/src/documentation/content/xdocs/programmatic-operation.xml
------------------------------------------------------------------------------
    svn:keywords = Id



Mime
View raw message