Mailing-List: contact connectors-commits-help@incubator.apache.org;
 run by ezmlm
Precedence: bulk
Reply-To: connectors-dev@incubator.apache.org
Date: Fri, 19 Feb 2010 21:18:00 +0000 (UTC)
From: confluence@apache.org
To: connectors-commits@incubator.apache.org
Message-ID: <527215892.1027.1266614280097.JavaMail.www-data@brutus.apache.org>
Subject: [CONF] Lucene Connector Framework > How to Build and Deploy Lucene
 Connector Framework
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
Auto-Submitted: auto-generated

Space: Lucene Connector Framework (http://cwiki.apache.org/confluence/display/CONNECTORS)
Page: How to Build and Deploy Lucene Connector Framework (http://cwiki.apache.org/confluence/display/CONNECTORS/How+to+Build+and+Deploy+Lucene+Connector+Framework)


Edited by Karl Wright:
---------------------------------------------------------------------
h1. Building LCF

To build Lucene Connector Framework, and the particular connectors you are interested in, you currently need to do the following:

# Check out [https://svn.apache.org/repos/asf/incubator/lcf/trunk].
# cd to "modules".
# Install desired dependent LGPL and proprietary libraries, wsdls, and xsds.  See below for details.
# Run ant.

If you supply *no* LGPL or proprietary libraries, the framework itself and only the following repository connectors will be built:

* Filesystem connector
* JDBC connector, with just the postgresql jdbc driver
* RSS connector
* Webcrawler connector

In addition, the following output connectors will be built:

* MetaCarta GTS output connector
* Lucene SOLR output connector
* Null output connector

The LGPL and proprietary connector dependencies are described below.

h2. Building the Documentum connector

The Documentum connector requires EMC's DFC product in order to be built.  Install DFC on the build system, and locate the jars it installs.  You will need to copy at least dfc.jar, dfcbase.jar, and dctm.jar into the directory "modules/connectors/documentum/dfc".

h2. Building the FileNet connector

The FileNet connector requires IBM's FileNet P8 API jar in order to be build.  Install the FileNet P8 API on the build system, and copy at least "Jace.jar" from that installation into "modules/connectors/filenet/filenet-api".

h2. Building the JDBC connector, including Oracle, SQLServer, or Sybase JDBC drivers

The JDBC connector also knows how to work with Oracle, SQLServer, and Sybase JDBC drivers.  For Oracle, download the appropriate Oracle JDBC jar from the Oracle site, and copy it into the directory "modules/connectors/jdbc/jdbc-drivers".  For SQLServer and Sybase, download jtds.jar, and copy it into the same directory.

h2. Building the jCIFS connector

To build this connector, you need to download jcifs.jar from http://samba.jcifs.org, and copy it into the "modules/connectors/jcifs/jcifs" directory.

h2. Building the LiveLink connector

This connector needs LAPI, which is a proprietary java library that allows access to OpenText's LiveLink server.  Copy the lapi.jar into the "modules/connectors/livelink/lapi" directory.

h2. Building the Memex connector

This connector needs the Memex API jar, usually called JavaMXIELIB.jar.  Copy this jar into the "modules/connectors/memex/mxie-java" directory.

h2. Building the Meridio connector

The Meridio connector needs wsdls and xsds downloaded from an installed Meridio instance using *disco.exe*, which is installed as part of Microsoft Visual Studio, typically under "c:\Program Files\Microsoft SDKs\Windows\V6.x\bin".  Obtain the preliminary wsdls and xsds by interrogating the following Meridio web services:

 * http\[s\]://<meridio_server>/DMWS/MeridioDMWS.asmx
 * http\[s\]://<meridio_server>/RMWS/MeridioRMWS.asmx

You should have obtained the following files in this step:

 * MeridioDMWS.wsdl
 * MeridioRMWS.wsdl
 * DMDataSet.xsd
 * RMDataSet.xsd
 * RMClassificationDataSet.xsd

Next, patch these using Microsoft's *xmldiffpatch* utility suite, downloadable for Windows from [http://msdn.microsoft.com/en-us/library/aa302294.aspx].  The appropriate diff files to apply as patches can be found in "modules/connectors/meridio/upstream-diffs".  After the patching, rename so that you have the files:

 * MeridioDMWS_axis.wsdl
 * MeridioRMWS_axis.wsdl
 * DMDataSet.xsd
 * RMDataSet.xsd
 * RMClassificationDataSet.xsd

Finally, copy all of these to: "modules/connectors/meridio/wsdls".

h2. Building the SharePoint connector

In order to build this connector, you need to download wsdls from an installed SharePoint instance.  The wsdls in question are:

 * Permissions.wsdl
 * Lists.wsdl
 * Dspsts.wsdl
 * usergroup.wsdl
 * versions.wsdl
 * webs.wsdl

To download a wsdl, use Microsoft's *disco.exe* tool, which is part of Visual Studio, typically under "c:\Program Files\Microsoft SDKs\Windows\V6.x\bin".  You'd want to interrogate the following urls:

 * http\[s\]://<server_name>/_vti_bin/Permissions.asmx
 * http\[s\]://<server_name>/_vti_bin/Lists.asmx
 * http\[s\]://<server_name>/_vti_bin/Dspsts.asmx
 * http\[s\]://<server_name>/_vti_bin/usergroup.asmx
 * http\[s\]://<server_name>/_vti_bin/versions.asmx
 * http\[s\]://<server_name>/_vti_bin/webs.asmx

When the wsdl files have been downloaded, copy them to: "modules/connectors/sharepoint/wsdls".

h1. Running Lucene Connector Framework

The core part of Lucene Connector Framework consists of several pieces.  These basic pieces are enumerated below:

 * A Postgresql database, which is where LCF keeps all of its configuration and state information
 * A synchronization directory, which how LCF coordinates activity among its various processes
 * An *agents* process, which is the process that actually crawls documents and ingests them
 * A *crawler-ui* web application, which presents the UI users interact with to configure and control the crawler
 * An *authorityservice* web application, which responds to requests for authorization tokens, given a user name

In addition, there are a number of java classes in Lucene Connector Framework that are intended to be called directly, to perform specific actions in the environment or in the database.  These classes are usually invoked from the command line, with appropriate arguments supplied.  Basic functionality supplied by these classes are as follows:

 * Create/Destroy the LCF database instance
 * Start/Stop the *agents* process
 * Register/Unregister an agent class (there's currently only one included)
 * Register/Unregister an output connector
 * Register/Unregister a repository connector
 * Register/Unregister an authority connector
 * Clean up synchronization directory garbage resulting from an ungraceful interruption of an LCF process
 * Query for certain kinds of job-related information

Individual connectors may contribute additional command classes and processes to this picture.  A properly built connector typically consists of:

 * One or more jar files meant to be included in the *agents* process and command invocation classpaths
 * An "iar" incremental war file, which is meant to be unpacked on top of the *crawler-ui* web application
 * Possibly a connector-specific process or two, each requiring a distinct classpath, which serves to isolate the *crawler-ui* web application and *agents* process from problematic aspects of the client environment

A connector package will typically supply an output connector, or a repository connector, or both a repository connector and an authority connector.

h2. Running the *agents* process

to be continued

h2. Deploying the *crawler-ui* war

to be continued

h2. Deploying the *authorityservice* war

to be continued

h2. Running commands

|| Command Class || Function ||
| org.apache.lcf.agents.AgentRun | Main *agent* process class |
| org.apache.lcf.agents.AgentStop | Stops the running *agent* process |

to be continued


Change your notification preferences: http://cwiki.apache.org/confluence/users/viewnotifications.action