Return-Path: Delivered-To: apmail-lucene-dev-archive@www.apache.org Received: (qmail 99076 invoked from network); 26 Oct 2010 18:06:46 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 26 Oct 2010 18:06:46 -0000 Received: (qmail 9339 invoked by uid 500); 26 Oct 2010 18:06:45 -0000 Delivered-To: apmail-lucene-dev-archive@lucene.apache.org Received: (qmail 9289 invoked by uid 500); 26 Oct 2010 18:06:45 -0000 Mailing-List: contact dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@lucene.apache.org Delivered-To: mailing list dev@lucene.apache.org Received: (qmail 9282 invoked by uid 99); 26 Oct 2010 18:06:45 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 26 Oct 2010 18:06:45 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.22] (HELO thor.apache.org) (140.211.11.22) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 26 Oct 2010 18:06:42 +0000 Received: from thor (localhost [127.0.0.1]) by thor.apache.org (8.13.8+Sun/8.13.8) with ESMTP id o9QI6LBm022090 for ; Tue, 26 Oct 2010 18:06:21 GMT Message-ID: <29615486.83881288116381326.JavaMail.jira@thor> Date: Tue, 26 Oct 2010 14:06:21 -0400 (EDT) From: "Grant Ingersoll (JIRA)" To: dev@lucene.apache.org Subject: [jira] Commented: (SOLR-2129) Provide a Solr module for dynamic metadata extraction/indexing with Apache UIMA In-Reply-To: <13511697.335581285134934066.JavaMail.jira@thor> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/SOLR-2129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12925068#action_12925068 ] Grant Ingersoll commented on SOLR-2129: --------------------------------------- bq. I can change the current hardcoded mapping mechanism using instead a simple mapping between UIMA extracted types/features and field names defined inside solrconfig.xml Try to reuse the same syntax as the mapping in the ExtractingRequestHandler. bq. A different option could be to develop a SolrCASConsumer component in UIMA (similar to Lucas [1], Lucene CAS Consumer) providing full control on how UIMA annotations and features can be mapped to Solr fields, but on UIMA side I've been struggling with these kinds of questions a lot lately. That is, the marriage of two projects. Where should the code go? Setting up another ASF project is a pain in the amount of hoops to jump through. Apache Labs doesn't cut it for a number of reasons. Hosting on Github or Google Code is OK, but loses the ASF community aspect. Sigh. bq. Regarding point 2 the jars are already under contrib/uima/lib so I can modify the sample solrconfig.xml adding the proper tag. Yep, exactly what I had in mind. > Provide a Solr module for dynamic metadata extraction/indexing with Apache UIMA > ------------------------------------------------------------------------------- > > Key: SOLR-2129 > URL: https://issues.apache.org/jira/browse/SOLR-2129 > Project: Solr > Issue Type: New Feature > Reporter: Tommaso Teofili > Assignee: Robert Muir > Attachments: lib-jars.zip, SOLR-2129-asf-headers.patch, SOLR-2129.patch > > > Provide components to enable Apache UIMA automatic metadata extraction to be exploited when indexing documents. > The purpose of this is to get unstructured information "inside" a document and create structured metadata (as fields) to enrich each document. > Basically this can be done with a custom UpdateRequestProcessor which triggers UIMA while indexing documents. > The basic UIMA implementation of UpdateRequestProcessor extracts sentences (with a tokenizer and an hidden Markov model tagger), named entities, language, suggested category, keywords and concepts (exploiting external services from OpenCalais and AlchemyAPI). Such an implementation can be easily extended adding or selecting different UIMA analysis engines, both from UIMA repositories on the web or creating new ones from scratch. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional commands, e-mail: dev-help@lucene.apache.org