On Fri, Dec 5, 2008 at 4:30 PM, Christof Mueller
<mueller@tk.informatik.tu-darmstadt.de> wrote:
> Jörn Kottmann wrote:
>> I am also interested in a Lucene CAS consumer.
>> Maybe we can work together and set up a sandbox project ?
>>
>> Jörn
> Hi Jörn,
>
> we would be happy to contribute the code of the example Lucene CAS
> consumer as base for the sandbox project.
>
> Christof
>
I've got an index!!!!
Yes, mixing some code from the JENA lucas (I kept it in a dust corner
of my harddisk :) ), some from DK and some mine, i produce an index.
If we want to start a Lucene indexer that's not only a proof of
concept but something very useful, it should be
configurable/exetendable.
The "problem", that's the UIMA's power, is that everyone has it's own
type system.
To produce a lucene document one extract information from some
features, applying the right analyzer. In my case I use maybe only 10%
of the annotation produced by the analysis pipeline to produce a
single lucene doc.
So we need a very highly configurable component, able to map only
certain declared features and applying the right analyzer and so on.
Mny ways are possible:
-completly programmatic: the indexer is abstract and should be
extended to implement the right mapping for a specialized typeSytem
and pipeline
-configurable: mapping rules are defined in a descriptor file; the
JENA component followed this way
-mix of the two: some mapping is configured, other are implemented
My 2€cents.
Regards,
Roberto
--
Roberto Franchini
http://www.celi.it
http://www.blogmeter.it
http://www.memesphere.it
Tel +39-011-6600814
jabber:ro.franchini@gmail.com skype:ro.franchini
|