lucene-solr-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Solr Wiki] Update of "OpenNLP" by LanceXNorskog
Date Sun, 26 Aug 2012 07:43:17 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.

The "OpenNLP" page has been changed by LanceXNorskog:
http://wiki.apache.org/solr/OpenNLP?action=diff&rev1=8&rev2=9

  Now, go to trunk-dir/solr and run 'ant test-contrib'. It compiles the OpenNLP lucene and
solr code against the OpenNLP libraries and uses the small model files.
  
  === Deployment to Solr ===
- A Solr core requires schema types for the OpenNLP Tokenizer & Filter, and also requires
model files.  The distribution includes a schema.xml file in solr/contrib/opennlp/src/test-files/opennlp/solr/conf/
which demonstrates OpenNLP-based analyzers. It does not contain other text types (to avoid
falling out of date with the full text suite). You should copy the text types from this file
into your test collection schema.xml, and download "real" models for testing. Also, you may
have to add the OpenNLP lib directory to your solr/lib or solr/cores/collection/lib directory.
+ A Solr core requires schema types for the OpenNLP Tokenizer & Filter, and also requires
"real" model files.  The distribution includes a schema.xml file in solr/contrib/opennlp/src/test-files/opennlp/solr/conf/
which demonstrates OpenNLP-based analyzers. It does not contain other text types (to avoid
falling out of date with the full text suite). You should copy the text types from this file
into your test collection schema.xml, and download "real" models for testing. Also, you may
have to add the OpenNLP lib directory to your solr/lib or solr/cores/collection/lib directory.
The text types assume that cores/collection/conf/opennlp contains the OpenNLP model files.

  
- Now, download these model files to solr/contrib/opennlp/src/test-files/opennlp/solr/conf/opennlp/
+ This server has "real" models for the OpenNLP project. Download model files to your solr/cores/collection/conf/opennlp
directory.
  
   * http://opennlp.sourceforge.net/models-1.5/
    * The English-language models start with 'en'. The 'maxent' models are preferred to the
'perceptron' models.
  
- Your Solr should start without any Exceptions. At this point, go to the Schema analyzer,
pick the 'text_opennlp_pos' field type, and post a sentence or two to the analyzer. You should
get text tokenized with payloads. Unfortunately, the analysis page shows them as bytes instead
of text. If you would like this in text form, then go vote on SOLR-3493.
+ Your Solr should start without any Exceptions. At this point, go to the Schema analyzer,
pick the 'text_opennlp_pos' field type, and post a sentence or two to the analyzer. You should
get text tokenized with payloads. Unfortunately, the analysis page shows them as bytes instead
of text. If you would like to see them in text form, then go vote on SOLR-3493 (or implement
it).
  
  == Licensing ==
  The OpenNLP library is Apache. The 'jwnl' library is 'BSD-like'.

Mime
View raw message