lucene-solr-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Solr Wiki] Update of "SolrUIMA" by TommasoTeofili
Date Wed, 09 Mar 2011 08:49:09 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.

The "SolrUIMA" page has been changed by TommasoTeofili.
http://wiki.apache.org/solr/SolrUIMA?action=diff&rev1=8&rev2=9

--------------------------------------------------

  If the attribute merge is false the field specified will be analyzed separately while if
merge is true the listed fields contents will be merged and analyzed only once.
  
  
+ see [[https://issues.apache.org/jira/browse/SOLR-2129|SOLR-2129]]
  
- see [[https://issues.apache.org/jira/browse/SOLR-2129|SOLR-2129]]
+ ==== UIMA components used ====
+ UIMA supports the use of existing analysis engines (see [[http://uima.apache.org/sandbox.html|here]]
and [[http://uima.apache.org/external-resources.html|here]]) as long as the creation of custom
components. 
+ 
+ The current contrib/uima module uses a predefined set of components :
+  1. [[http://uima.apache.org/sandbox.html#whitespace.tokenizer|WhitespaceTokenizer]]
+  2. [[http://uima.apache.org/sandbox.html#tagger.annotator|HMMTagger]]
+  3. [[http://uima.apache.org/sandbox.html#opencalais.annotator|OpenCalaisAnnotator]]
+  4. [[http://uima.apache.org/sandbox.html#alchemy.annotator|AlchemyAPIAnnotator]]
+ 
+ These components are arranged in a pipeline inside the [[https://svn.apache.org/repos/asf/lucene/dev/trunk/solr/contrib/uima/src/main/resources/org/apache/uima/desc/OverridingParamsExtServicesAE.xml|OverridingParamsExtServicesAE]]
Analysis Engine descriptor. As you can see looking at the descriptor fragment;
+ {{{
+         <node>AggregateSentenceAE</node>
+         <node>OpenCalaisAnnotator</node>
+         <node>TextKeywordExtractionAEDescriptor</node>
+         <node>TextLanguageDetectionAEDescriptor</node>
+         <node>TextCategorizationAEDescriptor</node>
+         <node>TextConceptTaggingAEDescriptor</node>
+         <node>TextRankedEntityExtractionAEDescriptor</node>
+ }}}
+ the first node represent an aggregate Analysis Engine which includes the Whitespace Tokenizer
and HMM Tagger (recognizing sentences), the second node uses the Open Calais Annotator to
extracte named entities, the following nodes use different Alchemy API Annotator services
to detect keywords, language, document category, discovered concepts and named entities.
+ 
+ ===== Using other UIMA components =====
+ To use different UIMA components inside the contrib/uima module you need to:
+  1. import the component jar
+  2. change the descriptor inside solrconfig/uimaConfig/analysisEngine element
+  3. optionally adjust Analysis Engine configuration
+  3. change the types and features' mapping inside solrconfig/uimaConfig/fieldMapping
+ 
+ ====== Import the component jar ======
+ If you're using Ant you only need put the component jar inside the solr/contrib/uima/lib
directory.
+ 
+ If you're using Maven you need to declare the component you want to use inside the <dependencies>
element in the generated pom.xml
+ 
+ ====== Change the descriptor ======
+ 
+ ====== Adjust AE configuration (optional) ======
+ 
+ ====== Change the types and features' mapping ======
  
  
  == Solrcas ==

Mime
View raw message