Return-Path: Delivered-To: apmail-incubator-uima-user-archive@locus.apache.org Received: (qmail 20041 invoked from network); 12 Oct 2007 13:35:35 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 12 Oct 2007 13:35:35 -0000 Received: (qmail 72503 invoked by uid 500); 12 Oct 2007 13:35:23 -0000 Delivered-To: apmail-incubator-uima-user-archive@incubator.apache.org Received: (qmail 72484 invoked by uid 500); 12 Oct 2007 13:35:23 -0000 Mailing-List: contact uima-user-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: uima-user@incubator.apache.org Delivered-To: mailing list uima-user@incubator.apache.org Received: (qmail 72475 invoked by uid 99); 12 Oct 2007 13:35:23 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 12 Oct 2007 06:35:23 -0700 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [193.168.50.254] (HELO SMT02002.global-sp.net) (193.168.50.254) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 12 Oct 2007 13:35:22 +0000 Received: from EXV01001.GlobalSP.local (unknown [172.20.30.3]) by SMT02002.global-sp.net (Postfix) with ESMTP id 1C70B425ED1; Fri, 12 Oct 2007 15:34:30 +0200 (CEST) X-MimeOLE: Produced By Microsoft Exchange V6.5 Content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Subject: RE: Asynchronous UIMA (workflow) ? Date: Fri, 12 Oct 2007 15:32:52 +0200 Message-ID: X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: Asynchronous UIMA (workflow) ? Thread-Index: AcgMKrNXmuwcj6zaQguRKZxlqt7vTgAPQWNw References: <101120071717.22259.470E5AC50007889B000056F32200734076C0C0CFCD099D0A0D03040108@comcast.net> From: "Pascal Coupet" To: "greg@holmberg.name" , X-global-asp-net-MailScanner: Found to be clean X-global-asp-net-MailScanner-SpamCheck: X-MailScanner-From: pascal.coupet@temis.com X-Virus-Checked: Checked by ClamAV on apache.org Hi Greg, I agree with you that human intervention may be often needed in NLP = related applications. In an editorial system by example, you may want a = review and validation of categories assigned automatically. However, I'm = not sure that this should be done within an UIMA pipeline. The UIMA = framework is a middleware and is not the whole application. It looks = difficult to me to manage that if 10 docs go into a pipeline, 9 will go = through at a normal pace and one will get stuck somewhere 1 or 2 days = for manual intervention. The framework is distributed and relies on = timeouts to detect errors. You will have to do something special to not = fail in error for this document and hope that the user will not forget = to do the job. If I go back to the editorial system example, it may require getting the = document quickly within the system even if some annotators did fail on = it. The application can then make some decisions depending on the = missing parts (ask an editor to complete, hide the document ...) One way to handle errors is simply to store them within the CAS. = Subsequent annotators can make decisions depending on previous errors. = In your example, No entity extraction will be make because no category = is available and the annotator will log an error "unable to extract = entities..." which is different than finding no entity. The application = receiving the CAS at the end of the workflow will propose it to an = editor who will select categories and then resubmit the document to the = annotation workflow to get it completed.=20 I think that the whole purpose of the UIMA framework is to glue together = various annotation engines and manage properly to distribution of the = work across machines. One workflow can be seen as a meta annotator which = has business meaning to your company or research center. It can be = ideally reused by different applications. So I will try to avoid as much = as possible to have application specific actions directly encoded within = it.=20 =20 Pascal =20 =20 Pascal Coupet Chief Technology Officer & Co-founder TEMIS INC 1518 Walnut Street, suite 1702, Philadelphia, PA 19102, USA Tel: +1 215 732 2549 ext 112=20 Mob: +1 215 609 2514 Fax: +1 215 732 0490=20 www.temis.com =20 Strictly Personal and Confidential This message may contain confidential and proprietary material for the = sole use of the intended recipient. Any review or distribution by or to = others is strictly prohibited. If you are not the intended recipient, = please contact the sender and delete all copies. -----Original Message----- From: greg@holmberg.name [mailto:holmberg2066@comcast.net]=20 Sent: Thursday, October 11, 2007 1:18 PM To: uima-user@incubator.apache.org; uima-user@incubator.apache.org Cc: Pascal Coupet Subject: RE: Asynchronous UIMA (workflow) ? Pascal-- I was thinking essentially the same thing: serialize the CAS to a file = or database, do your human interaction (possibly including the CAS = Editor), then reload it and resume processing. It would be nice to generalize it, rather than have two explicit = analysis engines. So a nice enhancement to UIMA would be the ability to = persist not just the CAS but the state of the engine along with it, so = that it could be stopped and restarted at any point. For my purposes, this would be useful if say, one annotator depended on = finding certain data in the CAS from another annotator, but that earlier = one failed or didn't produce the right data, and I need a user to = produce the data manually. For example, if a taxonomy classifier runs first and a named entity = extractor runs second, and the entity extractor wants to select a name = catalog to use based on the classification ("if classified biology, use = biology NC, else if classified chemistry use chemistry NC"), but the = classifier doesn't classify at all, or doesn't classify into the right = catefgory (not biology or chemisty), then I would want the user to = classify it manually. So I would persist that document and engine = state, notify the user, who would classify it, and then restart the = engine, which would then move on to run the entity extractor with an NC = based on the user's classification. Not knowing in advance where in the engine the failure will occur = (failure to classify being only one possibility), I can't create two = explicit engines. Having a general mechanism to persist the state of = the engine would let me handle any failure or missing dependency. NLP = being generally an imprecise process, I foresee human intervention in = the pipeline as a not-infrequent occurance. So having a mechanism to = deal with that in a general way would be helpful. This is not a high priority enhancement for me at the moment, just an = idea for us to kick around. Greg Holmberg -------------- Original message ---------------------- From: "Pascal Coupet" > Hi Thomas, >=20 > =20 >=20 > I think a way to do it is to split this process across 2 workflows. = The first=20 > consumer will get the CAS, eventually store it in XML somewhere (file, = database=20 > ...). A small application will manage the interaction with the user = (sending=20 > mail, reminders ...), watch a return address mailbox, update the XCAS = and make=20 > it available. The source of the second workflow will watch for = available updated=20 > XCAS and continue from there. You can in theory make the consumer of = the first=20 > workflow to send the mail and the source of the second watch for = incoming emails=20 > but it will be more difficult I think to manage properly the = interaction with=20 > users (reminder to responds, statistics, routing configuration ...).=20 >=20 > =20 >=20 > Just some thoughts, =20 >=20 > =20 >=20 > Pascal >=20 > ________________________________ >=20 > From: Thomas Francart [mailto:thomas.francart@mondeca.com]=20 > Sent: Thursday, October 11, 2007 7:01 AM > To: uima-user@incubator.apache.org > Subject: Asynchronous UIMA (workflow) ? >=20 > =20 >=20 >=20 > Hi all - >=20 > I'm thinking about whether or not it would be possible to add an = asynchronous=20 > step in a UIMA pipeline ? For example having an analysis engine that = would ask=20 > for a user input or a user review of a CAS, or something like that. = Well my=20 > point is that at some point in the pipeline, I would like a user to = review the=20 > state of the CAS, maybe add some more information, delete some others, = and so=20 > on; and then the rest of the pipeline would continue upon user = validation. (by=20 > "user" here I don't mean someone that sits in front of a computer and = watch the=20 > UIMA processing taking place, but maybe someone receiving an email = saying "hey,=20 > you should have a look and validate that"). >=20 > I know this a generic workflow question, but I was just wondering if = some other=20 > people had the same question/requirements with a UIMA integration, and = if you=20 > had some ideas on how it could be adressed/solved. >=20 > Best, > Thomas >=20 > --=20 >=20 > Thomas Francart=20 > Mondeca=20 > 3, cit=E9 Nollez 75018 Paris France=20 > Tel: +33 (0)1 44 92 35 04 - Fax: +33 (0)1 44 92 02 59=20 > Blog: mondeca.wordpress.com=20 > Web: www.mondeca.com=20 > Mail: thomas.francart@mondeca.com=20 >=20 >=20