Return-Path: Delivered-To: apmail-incubator-uima-user-archive@locus.apache.org Received: (qmail 53310 invoked from network); 5 Dec 2008 22:38:05 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 5 Dec 2008 22:38:05 -0000 Received: (qmail 98110 invoked by uid 500); 5 Dec 2008 22:38:17 -0000 Delivered-To: apmail-incubator-uima-user-archive@incubator.apache.org Received: (qmail 98074 invoked by uid 500); 5 Dec 2008 22:38:17 -0000 Mailing-List: contact uima-user-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: uima-user@incubator.apache.org Delivered-To: mailing list uima-user@incubator.apache.org Received: (qmail 98058 invoked by uid 99); 5 Dec 2008 22:38:17 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 05 Dec 2008 14:38:17 -0800 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of ro.franchini@gmail.com designates 209.85.128.191 as permitted sender) Received: from [209.85.128.191] (HELO fk-out-0910.google.com) (209.85.128.191) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 05 Dec 2008 22:36:46 +0000 Received: by fk-out-0910.google.com with SMTP id 19so169891fkr.12 for ; Fri, 05 Dec 2008 14:37:32 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:to :subject:in-reply-to:mime-version:content-type :content-transfer-encoding:content-disposition:references; bh=1Q1yYnjFWtKowuqCUfKPaMtgu+MzO68qCcXbH9G+CwQ=; b=s4fpOukcrvJFoV7yzh6iod5aNAy6X/5wOOC2KxNdTsFAqmmwYNnqypQG4iV6Trx7D2 zmxCLsHF4/pIFQhsVMn7HcYpyEic80sIzcu9ow9a0tjINB70pRxwIgFrH6fwupSCVOKM tYFbnDkXYDhAzJ82e9pqdIrQXbCo9BsE3WPhg= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:to:subject:in-reply-to:mime-version :content-type:content-transfer-encoding:content-disposition :references; b=knn2mbuEfLAG/C0oDsYD9wngrcReVSTr+hGpOeZsXhcZfxVIP48tGgL4dsEHd56DOM m73jP8HYiaMdWP4wnVNqQiyj+U8vuOCRGrG7OgFMEfp9HFKtyXTskIq53CyLrIhH1XPW nAiV7SNaYeVzYYM/2UI5hSjp5uBJ5hThPmIu8= Received: by 10.181.216.12 with SMTP id t12mr163772bkq.122.1228516652583; Fri, 05 Dec 2008 14:37:32 -0800 (PST) Received: by 10.181.20.19 with HTTP; Fri, 5 Dec 2008 14:37:32 -0800 (PST) Message-ID: <63e2e4460812051437m4a4f1e22x9f134876d803c53e@mail.gmail.com> Date: Fri, 5 Dec 2008 23:37:32 +0100 From: "Roberto Franchini" To: uima-user@incubator.apache.org Subject: Re: Lucene cas consumer In-Reply-To: <4939491F.3030500@tk.informatik.tu-darmstadt.de> MIME-Version: 1.0 Content-Type: text/plain; charset=WINDOWS-1252 Content-Transfer-Encoding: quoted-printable Content-Disposition: inline References: <63e2e4460812030706n3ece1828i69b411043e710f6@mail.gmail.com> <4938FA30.2090402@tk.informatik.tu-darmstadt.de> <49390503.4030003@gmail.com> <4939491F.3030500@tk.informatik.tu-darmstadt.de> X-Virus-Checked: Checked by ClamAV on apache.org On Fri, Dec 5, 2008 at 4:30 PM, Christof Mueller wrote: > J=F6rn Kottmann wrote: >> I am also interested in a Lucene CAS consumer. >> Maybe we can work together and set up a sandbox project ? >> >> J=F6rn > Hi J=F6rn, > > we would be happy to contribute the code of the example Lucene CAS > consumer as base for the sandbox project. > > Christof > I've got an index!!!! Yes, mixing some code from the JENA lucas (I kept it in a dust corner of my harddisk :) ), some from DK and some mine, i produce an index. If we want to start a Lucene indexer that's not only a proof of concept but something very useful, it should be configurable/exetendable. The "problem", that's the UIMA's power, is that everyone has it's own type system. To produce a lucene document one extract information from some features, applying the right analyzer. In my case I use maybe only 10% of the annotation produced by the analysis pipeline to produce a single lucene doc. So we need a very highly configurable component, able to map only certain declared features and applying the right analyzer and so on. Mny ways are possible: -completly programmatic: the indexer is abstract and should be extended to implement the right mapping for a specialized typeSytem and pipeline -configurable: mapping rules are defined in a descriptor file; the JENA component followed this way -mix of the two: some mapping is configured, other are implemented My 2=80cents. Regards, Roberto --=20 Roberto Franchini http://www.celi.it http://www.blogmeter.it http://www.memesphere.it Tel +39-011-6600814 jabber:ro.franchini@gmail.com skype:ro.franchini