lucene-solr-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Solr Wiki] Update of "Solr4UIMA" by MogenetiDev
Date Tue, 12 Feb 2013 12:04:43 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.

The "Solr4UIMA" page has been changed by MogenetiDev:
http://wiki.apache.org/solr/Solr4UIMA?action=diff&rev1=24&rev2=25

  ## page was copied from SolrUIMA
- = Solr UIMA integration =
+ = Solr 4 UIMA Tutorial =
  <!> [[Solr4.1]]
  
  <<TableOfContents>>
  
  Solr UIMA contrib enables enhancing of Solr documents using the Unstructured Information
Management Architecture ([[http://uima.apache.org|UIMA]]).
- UIMA lets you define custom pipelines of Analysis Engines which incrementally add metadata
to the document via annotations.
+ UIMA lets you define custom pipelines of Analysis Engines which incrementally add metadata
to the document via annotations. In this tutorial we first install the Eclipse UIMA toolkit,
create a custom UIMA Annotator, test the Annotator using the UIMA CAS Visual debugger, create
a JAR file for use with Solr 4 and setup Solr to use the Annotator.
+ 
+ == Setup UIMA toolkit in Eclipse ==
+ 
+ More details can be found here:
+ [[http://uima.apache.org/downloads/releaseDocs/2.2.2-incubating/docs/html/overview_and_setup/overview_and_setup.html#ugr.ovv.eclipse_setup]]
+ 
+  1. Install Eclipse Modelling Framwork (EMF) from the Eclipse update site
+  2. Install Apache UIMA eclipse tooling from [[http://www.apache.org/dist/uima/eclipse-update-site]]
+  3. Install Apache UIMA from [[http://uima.apache.org/downloads.cgi]]
+  4. Open uimaj-examples (this will enable Run As functionality for the e.g. the JCas debugger)
+ 	* File - Import - General / Existing Projects into workspace - Select apache-uima folder
+ 	* This will automatically add uimaj-examples to the workspace
+ 
+ == Create your own UIMA Annotator ==
+ 
+ More details can be found here:
+ [[http://uima.apache.org/doc-uima-annotator.html]]
+ 
+  1. 	Create a new Java project in your Eclipse workspace called RoomNumberAnnotator. To
do this select "File -> New -> Java Project"
+         and use RoomNumberAnnotator as the project name. Also, in the Project Layout section,
make sure the button to
+ 	"Create separate folders for sources and class files" is checked.
+  2. 	Add the UIMA nature to the project by right-clicking on the "RoomNumberAnnotator" project
and choose "Add UIMA Nature".
+ 	Confirm the upcoming dialogues with "Yes" to add the UIMA nature, pressing "OK", next,
to confirm the status message dialog.
+ 	This will create a default directory layout of folders useful for annotator component development.
+  3. 	Project - Right click - Add UIMA nature
+  4. 	Configure build path (create Variable UIMA_HOME):
+  	*	Right-click to the RoomNumberAnnotator project and choose Build Path -> Configure
Build Path.
+ 	*	Click the "Add Variable..." button, and select the "UIMA_HOME" variable. Add new variable
now, using the Configure Variables,           setting it to the home directory where you have
UIMA installed.
+ 	*	Click the "Extend..." button and chose the uima-core.jar in "lib" directory. You could
add other jars from the UIMA lib, but the uima-core.jar is the only one needed for this project.
+ 	*	Finalize all dialogues with the "OK" button.
+  5.	Define Annotator type
+ 	*	Right-click on the "desc" folder of your project and choose "New -> Other"
+ 	*	Select "Analysis Engine Descriptor" from the "UIMA" folder and press "Next"
+ 	*	Enter "RoomNumberAnnotatorDescriptor.xml" as file name, and press "Finish"
+  6.	Add new type (RoomNumber) to the RoomNumberAnnotatorDescriptor.xml
+ 	*	Open the descriptor using the UIMA Component Descriptor Editor (CDE) by right-click to
the "RoomNumberAnnotatorDescriptor.xml"
+ 		file and choose "Open With -> Component Descriptor Editor"
+ 	*	Select the "TypeSystem" tab at the bottom to show the type system definition page.
+ 	*	Press the "Add Type" button to add the new type. Use "org.apache.uima.tutorial.RoomNumber"
+ 		as type name and finish with "OK". The supertype "uima.tcas.Annotation" is correct
+  7.	Add new feature (building) to type RoomNumber
+ 	*	Select the "org.apache.uima.tutorial.RoomNumber" type by clicking it.
+ 	*	Click the "Add..." button to add a feature to the type and specify "building" as feature
name and "uima.cas.String"
+ 		as range type. This means that the "building" feature is a String based feature.
+ 	*	Finish the dialog by clicking "OK".
+ 	*	Save the descriptor file
+  8. 	Automatically create Java classes:
+ 	*	Open the descriptor file in the Component Descriptor Editor and select the "Type System"
tab.
+ 	*	Press the "JCasGen" button that will trigger the Java class generation.
+ 		The generated classes will be added to the "src" folder of your project in a separate
package.
+  9.	Write Java code for the Annotator
+ 	*	Right-click on the "src" folder and select "New -> Class"
+ 	*	Package: org.apache.uima.tutorial.ex1
+ 		Name: RoomNumberAnnotator
+ 		Superclass: org.apache.uima.analysis_component.JCasAnnotator_ImplBase
+  10.	Test the Annotator:
+ 	*	Run - Run as - Run configurations - Java Application - UIMA CAS Visual debugger
+ 	*	Select the "User Entries" in the classpath tab and press the "Add Projects..." button
+ 	*	Mark the "RoomNumberAnnotator" project in the upcoming dialog and finish with "OK"
+ 	*	Run the CAS Visual Debugger (CVD) by selecting "Run"
+ 	*	Choose "Run -> Load AE" and select the RoomNumberAnnotatorDescriptor.xml file in the
desc folder of your Eclipse project
+ 	*	Copy and past the text below for testing to the text section of the CVD
+ 
+        {{{
+         April 7, 2004 Distillery Lunch Seminar
+ 	UIMA and its Metadata
+ 	12:00PM-1:00PM in HAW GN-K35
+ 
+ 	April 16, 2004 KM & I Department Tea
+ 	Title: An Eclipse-based TAE Configurator Tool
+ 	3:00PM-4:30PM in HAW GN-K35
+ 
+ 	May 11, 2004 UIMA Tutorial
+ 	9:00AM-5:00PM in YKT 20-001
+         }}}
+ 
+ 	*	To run the annotator on the specified text, choose "Run -> RunRoomNumberAnnotatorDescriptor"
+  11. Create JAR file from Project: Right-click on the Project - Export - Java - JAR file
+  12. Copy the JAR file to SOLR_HOME/example/solr/collection1/lib
+ 
+ 
  
  == SolrUIMA UpdateRequestProcessor ==
  The SolrUIMA UpdateRequestProcessor is a custom UpdateRequestProcessor that takes document(s)
being indexed, sends them to a UIMA pipeline and then returns the document(s) enriched with
the specified metadata.
  
  
  === Installation ===
-  1. Go to dev/solr/contrib/uima and run 'ant clean dist'
-  2. get the package apache-solr-uima-4.0-SNAPSHOT.jar together with the jars under the dev/solr/contrib/uima/lib
directory and paste everything inside one of the lib directories of your Solr instance (defined
inside the solrconfig.xml).  You may need to create the lib directory for a specific core.

+  1. Download latest Solr 4.x release [[http://www.apache.org/dyn/closer.cgi/lucene/solr/]]
+  2. Copy the following files from the Solr release to the Solr document location you are
using (in this case solr/example/solr/collection1)
    {{{
    mkdir solr/example/solr/collection1/lib
-   cp solr/dist/apache-solr-uima*.jar solr/example/solr/collection1/lib
+   cp solr/dist/solr-uima*.jar solr/example/solr/collection1/lib
    cp solr/contrib/uima/lib/*.jar solr/example/solr/collection1/lib/
-   cp solr/build/contrib/solr-uima/lucene-libs/lucene-analyzers-uima-4.0-SNAPSHOT.jar solr/example/solr/collection1/lib/
+   cp solr/contrib/uima/lucene-libs/lucene-analyzers-uima*.jar solr/example/solr/collection1/lib/
    }}}
  
-  3. modify your Solr instance config files as described in the [[https://svn.apache.org/repos/asf/lucene/dev/trunk/solr/contrib/uima/README.txt|solr/contrib/solr-uima/README.txt]]
+  3. Modify your Solr instance config files as described in the [[https://svn.apache.org/repos/asf/lucene/dev/trunk/solr/contrib/uima/README.txt|solr/contrib/solr-uima/README.txt]]
-  4. run your Solr instance and enjoy UIMA enriching documents being indexed
+  4. Run your Solr instance and enjoy UIMA enriching documents being indexed
  
  === Configuration ===
  
@@ -57, +138 @@

  see [[https://issues.apache.org/jira/browse/SOLR-2129|SOLR-2129]]
  
  === UIMA components used ===
- UIMA supports the use of existing analysis engines (see [[http://uima.apache.org/sandbox.html|here]]
and [[http://uima.apache.org/external-resources.html|here]]) as long as the creation of custom
components. 
+ UIMA supports the use of existing analysis engines (see [[http://uima.apache.org/sandbox.html|here]]
and [[http://uima.apache.org/external-resources.html|here]]) as long as the creation of custom
components.
  
  The current contrib/uima module uses a predefined set of components :
   1. [[http://uima.apache.org/sandbox.html#whitespace.tokenizer|WhitespaceTokenizer]]
@@ -105, +186 @@

  
  One can use the default one bundled inside the component or create a new one.
  
- For example to use one of the default Dictionary Annotator Analysis Engine descriptors use
the following (which runs Whitespace Tokenizer and then Dictionary Annotator): 
+ For example to use one of the default Dictionary Annotator Analysis Engine descriptors use
the following (which runs Whitespace Tokenizer and then Dictionary Annotator):
  {{{
    <config>
      ...

Mime
View raw message