lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tommaso Teofili <tommaso.teof...@gmail.com>
Subject Re: Taxonomy and Faceting
Date Mon, 13 Dec 2010 16:01:41 GMT
With the SOLR-2129 patch you enable an Apache UIMA [1] pipeline to enrich
documents being indexed.
The base pipeline provided with the patch uses the following blocks (see
OverridingParamsExtServicesAE.xml):

        <node>AggregateSentenceAE</node>

        <node>OpenCalaisAnnotator</node>

        <node>TextKeywordExtractionAEDescriptor</node>

        <node>TextLanguageDetectionAEDescriptor</node>

        <node>TextCategorizationAEDescriptor</node>

        <node>TextConceptTaggingAEDescriptor</node>

        <node>TextRankedEntityExtractionAEDescriptor</node>
This enables tokenizing, adding part of speech to tokens extract sentences
with WhitespaceTokenizer and HMMTagger, then inserts named entities and
language extracted with OpenCalaisAnnotator and AlchemyAPIAnnotator.
The parameters you underlined are relevant only if you use
OpenCalaisAnnotator and AlchemyAPIAnnotator; as you may see those are
runtime parameters, so depending on which Analysis Engine you're executing
you could need or not such parameters or need other ones.
However you can change the pipeline blocks to use to whatever you want,
provided that they are UIMA compliant specifying the relative Analysis
Engine descriptor inside the tag:
           <analysisEngine>/org/apache/uima/desc
/OverridingParamsExtServicesAE.xml</analysisEngine>.
There are many other engines you can use and configure with SOLR-2129, see
[2] and [3].
I hope this clarifies things a little more.
Cheers,
Tommaso

[1] : http://uima.apache.org
[2] : http://uima.apache.org/sandbox.html
[3] : http://uima.apache.org/external-resources.html

2010/12/13 webdev1977 <webdev1977@gmail.com>

>
> Based on this:
>
> <keyword_apikey>VALID_ALCHEMYAPI_KEY</keyword_apikey>
>
>      <concept_apikey>VALID_ALCHEMYAPI_KEY</concept_apikey>
>
>      <lang_apikey>VALID_ALCHEMYAPI_KEY</lang_apikey>
>
>      <cat_apikey>VALID_ALCHEMYAPI_KEY</cat_apikey>
>
>      <entities_apikey>VALID_ALCHEMYAPI_KEY</entities_apikey>
>
>      <oc_licenseID>VALID_OPENCALAIS_KEY</oc_licenseID>
>
>
> ...this can't be used unless you use some sort of processing engine?  I am
> playing around with some other open source tagging software, but I have yet
> to get very far.
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Taxonomy-and-Faceting-tp2028442p2079148.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message