Return-Path: X-Original-To: apmail-uima-user-archive@www.apache.org Delivered-To: apmail-uima-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 1B3E010CCF for ; Tue, 23 Apr 2013 15:24:04 +0000 (UTC) Received: (qmail 58569 invoked by uid 500); 23 Apr 2013 15:24:03 -0000 Delivered-To: apmail-uima-user-archive@uima.apache.org Received: (qmail 58461 invoked by uid 500); 23 Apr 2013 15:24:03 -0000 Mailing-List: contact user-help@uima.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@uima.apache.org Delivered-To: mailing list user@uima.apache.org Received: (qmail 58452 invoked by uid 99); 23 Apr 2013 15:24:02 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 23 Apr 2013 15:24:02 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of msa@schor.com designates 67.18.62.20 as permitted sender) Received: from [67.18.62.20] (HELO gateway02.websitewelcome.com) (67.18.62.20) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 23 Apr 2013 15:23:56 +0000 Received: by gateway02.websitewelcome.com (Postfix, from userid 5007) id AAAF1FC501672; Tue, 23 Apr 2013 10:23:28 -0500 (CDT) Received: from gator74.hostgator.com (gator74.hostgator.com [184.173.199.208]) by gateway02.websitewelcome.com (Postfix) with ESMTP id 95B14FC5015DC for ; Tue, 23 Apr 2013 10:23:28 -0500 (CDT) Received: from [129.34.20.23] (port=24929 helo=[9.2.210.137]) by gator74.hostgator.com with esmtpsa (TLSv1:DHE-RSA-CAMELLIA256-SHA:256) (Exim 4.80) (envelope-from ) id 1UUf49-0005CL-Ej for user@uima.apache.org; Tue, 23 Apr 2013 10:23:33 -0500 Message-ID: <5176A773.8040309@schor.com> Date: Tue, 23 Apr 2013 11:23:31 -0400 From: Marshall Schor User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:17.0) Gecko/20130328 Thunderbird/17.0.5 MIME-Version: 1.0 To: user@uima.apache.org Subject: Re: Using PEAR in a application based on Uima framework References: In-Reply-To: X-Enigmail-Version: 1.5.1 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - gator74.hostgator.com X-AntiAbuse: Original Domain - uima.apache.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - schor.com X-BWhitelist: no X-Source: X-Source-Args: X-Source-Dir: X-Source-Sender: ([9.2.210.137]) [129.34.20.23]:24929 X-Source-Auth: msa+schor.com X-Email-Count: 1 X-Source-Cap: bWlzY2hvcjttaXNjaG9yO2dhdG9yNzQuaG9zdGdhdG9yLmNvbQ== X-Virus-Checked: Checked by ClamAV on apache.org To combine annotators written by different groups at different times does require some level of conformity in the type system, or the creation of type conversions or mappers. Some use cases: Let's presume you have some annotators that work off of "tokens". Let's now presume you have an upstream annotator that annotates tokens. Let's now say you build a system using both of these. If the tokenizer produces types x.y.z.Token (I'm fully qualifying the name of the type), then the user of these tokens would need to iterate over the type "x.y.z.Token" to get the tokens to work on. If later, you have a better tokenizer, there are two sub-cases. In the first case, say you find a better tokenizer (let's imagine it properly handles tokens for non-western languages, or handles multi-word tokens, etc.). If that tokenizer produces tokens of type x.y.z.Token (unlikely if it comes from another sources, or maybe likely if it is version 2 of the original tokenizer), then you can just "plug it in". This is true even if the type x.y.z.Token adds some new features (which your downstream annotator may not define), such as "multi_word". This is OK because UIMA, before starting processing, collects all the type systems and merges the types - so that if annotator 1 defines type x.y.z.Token as having a "multi_word" feature, but annotator 2 doesn't, then the merged type definition will in any case have a slot for that feature. On the other hand, if your use case is one where the new tokenizer is from another company, and it produces token annotations of type a.b.c.TTT, then your downstream annotator which is looking tokens of type x.y.z.Token won't find any. There, you have to either re-write your annotator to use the new kind of token, or insert some kind of type mapping annotator inbetween. Sometimes the type mapping can be trivial, and other times, it can be arbitrarily complex. When UIMA was first being conceived, there was some thought given to trying to "standardize" on type systems, to minimize these kind of issues, but looking at the vast and diverse community of people and projects working in this area, it was felt that this was too difficult to accomplish. So UIMA has somewhat of a compromise - an ability to "merge" different type systems, effectively creating a union of all the types and features. HTH -Marshall On 4/22/2013 9:44 PM, swirl wrote: > swirl writes: > >> I am currently developing a Tomcat application that wraps around Uima to > run >> text mining processes. >> I am confused over what PEAR can be used for and how it can be used in a > Uima- >> wrapped application. >> >> The application is to be deployed as a installed web application at our >> client's location and it is meant to be more or less a black box to our >> client. That is, our client should not need to know about the intricracies > of >> Uima or the various analysis engines to perform text mining processes. >> Our application presents them a simple facade that thats in input from > them, >> runs the input through an analysis pipeline (consisting of annotators, cas >> consumers, etc) and returns an analysed, annotated document to them. >> >> But we also want our application to be easily extensible and changed, in > case >> we have a better version of analysis engine, we want to deploy just the > engine >> to the client without having to re-compile and re-deploy the whole >> application. >> >> Can we make use of PEAR to do the deployment? >> If so, what about the types used in the analysis engines in the PEAR, how > does >> the deployed application know about the new or modified types in the PEAR? >> >> > > > Erhmmm, has anybody do something like this before? > I really am interested to know how you can do it. > > To clarify, I am very interested in how you can mix-match different PEARs, > possibly from different open source projects, with different type systems, > and run them in a pipeline as a coherent whole. > > How do you resolve the issue that all their type systems are of different > Java types and be able to use each other's analysis results in the pipeline. > > Thanks! > > >