lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Grant Ingersoll (JIRA)" <>
Subject [jira] Commented: (SOLR-2129) Provide a Solr module for dynamic metadata extraction/indexing with Apache UIMA
Date Mon, 25 Oct 2010 19:42:19 GMT


Grant Ingersoll commented on SOLR-2129:

Cool stuff, Tommaso.  I'm starting to look at adding classifiers into Solr via Mahout, so
thought I would look at this too.  

Couple of early things, based on looking at the getting started instructions.

# I think we should do like we do with Tika and provide a way for users to map UIMA output
to Solr fields as opposed to having to hardcode in specific fields.
# For the Jars, have a look at how the clustering is setup.  We should be able to just point
at the UIMA libs in solrconfig.xml under contrib/uima/lib instead of having to copy them around

> Provide a Solr module for dynamic metadata extraction/indexing with Apache UIMA
> -------------------------------------------------------------------------------
>                 Key: SOLR-2129
>                 URL:
>             Project: Solr
>          Issue Type: New Feature
>            Reporter: Tommaso Teofili
>            Assignee: Robert Muir
>         Attachments:, SOLR-2129-asf-headers.patch, SOLR-2129.patch
> Provide components to enable Apache UIMA automatic metadata extraction to be exploited
when indexing documents.
> The purpose of this is to get unstructured information "inside" a document and create
structured metadata (as fields) to enrich each document.
> Basically this can be done with a custom UpdateRequestProcessor which triggers UIMA while
indexing documents.
> The basic UIMA implementation of UpdateRequestProcessor extracts sentences (with a tokenizer
and an hidden Markov model tagger), named entities, language, suggested category, keywords
and concepts (exploiting external services from OpenCalais and AlchemyAPI). Such an implementation
can be easily extended adding or selecting different UIMA analysis engines, both from UIMA
repositories on the web or creating new ones from scratch.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message