manifoldcf-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
Subject [CONF] Lucene Connector Framework > Programmatic Operation of LCF
Date Wed, 12 May 2010 19:25:00 GMT
Space: Lucene Connector Framework (
Page: Programmatic Operation of LCF (

Added by Karl Wright:
h1. Programmatic Operation of LCF

A certain subset of LCF users want to think of LCF as an engine that they can poke from whatever
other system they are developing.  While LCF is not precisely a document indexing engine per
se, it can certainly be controlled programmatically.  Right now, there are two principle ways
of achieving this control.

h3. Control via Commands

For script writers, there currently exist a number of LCF execution commands.  These commands
are primarily rich in the area of definition of connections and jobs, controlling jobs, and
running reports.  The following table lists the current suite.

|| Command || What it does ||
| org.apache.lcf.agents.DefineOutputConnection | Create a new output connection |
| org.apache.lcf.agents.DeleteOutputConnection | Delete an existing output connection |
| org.apache.lcf.authorities.ChangeAuthSpec | Modify an authority's configuration information
| org.apache.lcf.authorities.CheckAll | Check all authorities to be sure they are functioning
| org.apache.lcf.authorities.DefineAuthorityConnection | Create a new authority connection
| org.apache.lcf.authorities.DeleteAuthorityConnection | Delete an existing authority connection
| org.apache.lcf.crawler.AbortJob | Abort a running job |
| org.apache.lcf.crawler.AddScheduledTime | Add a schedule record to a job |
| org.apache.lcf.crawler.ChangeJobDocSpec | Modify a job's specification information |
| org.apache.lcf.crawler.DefineJob | Create a new job |
| org.apache.lcf.crawler.DefineRepositoryConnection | Create a new repository connection |
| org.apache.lcf.crawler.DeleteJob | Delete an existing job |
| org.apache.lcf.crawler.DeleteRepositoryConnection | Delete an existing repository connection
| org.apache.lcf.crawler.ExportConfiguration | Write the complete list of all connection definitions
and job specifications to a file |
| org.apache.lcf.crawler.FindJob | Locate a job identifier given a job's name |
| org.apache.lcf.crawler.GetJobSchedule | Find a job's schedule given a job's identifier |
| org.apache.lcf.crawler.ImportConfiguration | Import configuration as written by a previous
ExportConfiguration command |
| org.apache.lcf.crawler.ListJobStatuses | List the status of all jobs |
| org.apache.lcf.crawler.ListJobs | List the identifiers for all jobs |
| org.apache.lcf.crawler.PauseJob | Given a job identifier, pause the specified job |
| org.apache.lcf.crawler.RestartJob | Given a job identifier, restart the specified job |
| org.apache.lcf.crawler.RunDocumentStatus | Run a document status report |
| org.apache.lcf.crawler.RunMaxActivityHistory | Run a maximum activity report |
| org.apache.lcf.crawler.RunMaxBandwidthHistory | Run a maximum bandwidth report |
| org.apache.lcf.crawler.RunQueueStatus | Run a queue status report |
| org.apache.lcf.crawler.RunResultHistory | Run a result history report |
| org.apache.lcf.crawler.RunSimpleHistory | Run a simply history report |
| org.apache.lcf.crawler.StartJob | Start a job |
| org.apache.lcf.crawler.WaitForJobDeleted | After a job has been deleted, wait until the
delete has completed |
| org.apache.lcf.crawler.WaitForJobInactive | After a job has been started or aborted, wait
until the job ceases all activity |
| org.apache.lcf.crawler.WaitJobPaused | After a job has been paused, wait for the pause to
take effect |

h3. Control by direct code

Control by direct java code is quite a reasonable thing to do.  The sources of the above commands
should give a pretty clear idea how to proceed, if that's what you want to do.

h3. Control by Servlet API

It has been proposed to allow all the above control operations to be executable via a web
HTTP command interface.  But no such interface yet exists.

h3. Caveats

The existing commands know nothing about the differences between connection types.  Instead,
they deal with configuration and specification information in the form of XML documents. 
Normally, these XML documents are hidden from a system integrator, unless they happen to look
into the database with a tool such as psql.  But the API commands above often will require
such XML documents to be included as part of the command execution.

This has one major consequence.  Any application that would manipulate connections and jobs
directly cannot be connection-type independent - these applications must know the proper form
of XML to submit to the command.  So, it is not possible to use these command APIs to write
one's own UI wrapper, without sacrificing some of the generality that LCF by itself maintains.

Change your notification preferences:

View raw message