uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joachim Wermter <Joachim.Werm...@uni-jena.de>
Subject Re: Lucene cas consumer
Date Tue, 09 Dec 2008 08:51:16 GMT
Dear UIMA-Users,

at the JULIE Lab, we've been working (silently) on a new (and completely
altered) version of our Lucene CAS Indexer consumer (Lucas). We are
planning to make this available soon -- preferably in the UIMA sandbox. 
In fact, LUCAS now is able to perform offset-based token stream
alignment and merging of UIMA annotations (via position increment) in
the same Lucene field (e.g. "documenttext" or "title"), which we feel is
more appropriate for text indexing -- instead of putting each UIMA
annotation into a separate field like the Solr approach (still possible
with the new LUCAS).

At the heart for the user is a flexible XML-based "mapping configuration
file" in which the user can determine which UIMA annotations should be
put into which Lucene field, and how this field is set up (e.g.
TOKENIZED or Stored). In addition, some basic functionality for hypernym
indexing is provided. A sample mapping file is appended to illustrate this.

What we lack at the moment is a thorough documentation of the code, and
more critical, a DTD describing the mapping (we will try to deliver this
asap).

By putting this into the sandbox, we hope the UIMA community will
embrace this tool and help to develop it further. Any immediate feedback
will be very welcome!

Best wishes,
Rico Landefeld
Joachim Wermter


-- 
Jena University Language and Information Engineering (JULIE) Lab
Phone: +49-3641-944324
Fax:   +49-3641-944321
Web:   http://www.julielab.de


Mime
View raw message