uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Roberto Franchini" <ro.franch...@gmail.com>
Subject Re: Lucene cas consumer
Date Fri, 05 Dec 2008 22:37:32 GMT
On Fri, Dec 5, 2008 at 4:30 PM, Christof Mueller
<mueller@tk.informatik.tu-darmstadt.de> wrote:
> Jörn Kottmann wrote:
>> I am also interested in a Lucene CAS consumer.
>> Maybe we can work together and set up a sandbox project ?
>>
>> Jörn
> Hi Jörn,
>
> we would be happy to contribute the code of the example Lucene CAS
> consumer as base for the sandbox project.
>
> Christof
>


I've got an index!!!!
Yes, mixing some code from the JENA lucas (I kept it in a dust corner
of my harddisk :) ), some from DK and some mine, i produce an index.
If we want to start a Lucene indexer that's not only a proof of
concept but something very useful, it should be
configurable/exetendable.
The "problem", that's the UIMA's power,  is that everyone has it's own
type system.
To produce a lucene document one extract information from some
features, applying the right analyzer. In my case I use maybe only 10%
of the annotation produced by the analysis pipeline to produce a
single lucene doc.
So we need a very highly configurable component, able to map only
certain declared features and applying the right analyzer and so on.
Mny ways are possible:
-completly programmatic: the indexer is abstract and should be
extended to implement the right mapping for a specialized typeSytem
and pipeline
-configurable: mapping rules are defined in a descriptor file; the
JENA component followed this way
-mix of the two: some mapping is configured, other are implemented

My 2€cents.
Regards,
Roberto


-- 
Roberto Franchini
http://www.celi.it
http://www.blogmeter.it
http://www.memesphere.it
Tel +39-011-6600814
jabber:ro.franchini@gmail.com skype:ro.franchini

Mime
View raw message