Return-Path: Delivered-To: apmail-lucene-dev-archive@www.apache.org Received: (qmail 9860 invoked from network); 24 Sep 2010 13:10:00 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 24 Sep 2010 13:10:00 -0000 Received: (qmail 75005 invoked by uid 500); 24 Sep 2010 13:09:59 -0000 Delivered-To: apmail-lucene-dev-archive@lucene.apache.org Received: (qmail 74277 invoked by uid 500); 24 Sep 2010 13:09:56 -0000 Mailing-List: contact dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@lucene.apache.org Delivered-To: mailing list dev@lucene.apache.org Received: (qmail 74257 invoked by uid 99); 24 Sep 2010 13:09:55 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 24 Sep 2010 13:09:55 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.22] (HELO thor.apache.org) (140.211.11.22) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 24 Sep 2010 13:09:55 +0000 Received: from thor (localhost [127.0.0.1]) by thor.apache.org (8.13.8+Sun/8.13.8) with ESMTP id o8OD9YQ9024097 for ; Fri, 24 Sep 2010 13:09:34 GMT Message-ID: <30425833.384271285333774828.JavaMail.jira@thor> Date: Fri, 24 Sep 2010 09:09:34 -0400 (EDT) From: "Tommaso Teofili (JIRA)" To: dev@lucene.apache.org Subject: [jira] Updated: (SOLR-2129) Provide a Solr module for dynamic metadata extraction/indexing with Apache UIMA In-Reply-To: <13511697.335581285134934066.JavaMail.jira@thor> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/SOLR-2129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tommaso Teofili updated SOLR-2129: ---------------------------------- Attachment: SOLR-2129-asf-headers.patch Same patch plus required ASF headers on code and xml > Provide a Solr module for dynamic metadata extraction/indexing with Apache UIMA > ------------------------------------------------------------------------------- > > Key: SOLR-2129 > URL: https://issues.apache.org/jira/browse/SOLR-2129 > Project: Solr > Issue Type: New Feature > Reporter: Tommaso Teofili > Attachments: SOLR-2129-asf-headers.patch, SOLR-2129.patch > > > Provide components to enable Apache UIMA automatic metadata extraction to be exploited when indexing documents. > The purpose of this is to get unstructured information "inside" a document and create structured metadata (as fields) to enrich each document. > Basically this can be done with a custom UpdateRequestProcessor which triggers UIMA while indexing documents. > The basic UIMA implementation of UpdateRequestProcessor extracts sentences (with a tokenizer and an hidden Markov model tagger), named entities, language, suggested category, keywords and concepts (exploiting external services from OpenCalais and AlchemyAPI). Such an implementation can be easily extended adding or selecting different UIMA analysis engines, both from UIMA repositories on the web or creating new ones from scratch. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional commands, e-mail: dev-help@lucene.apache.org