uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <gsing...@apache.org>
Subject Re: Lucene cas consumer
Date Thu, 11 Dec 2008 20:51:44 GMT
Coming late to the conversation...  Just offering some Lucene  
perspective

On Dec 4, 2008, at 1:36 PM, Niels Ott wrote:

> What Lucene cannot do - or at least not without a lot of hacking - is
> aggregating analyses as UIMA can using the CAS. Usually your knowledge
> grows during an UIMA-based NLP-pipeline: you add the a token  
> annotation,
> a lemma annotation, a POS-annotation and so on...  In Lucene, you have
> the classical pipeline: the output replaces the input. (Yes, by
> subclassing Lucene's "Token" class, one can fiddle around the issue,  
> but
> it is not elegant at all.)
>

You might find the TeeTokenFilter and SinkTokenizer interesting for  
mapping/aggregating tokens/extractions out to other fields in Lucene.

Also, Lucene is getting more flexible in terms of indexing and  
searching.   You can attach payloads to terms (i.e. byte arrays) which  
can provide some crude annotation storage and https://issues.apache.org/jira/browse/LUCENE-1422

  and a couple of other issues are the start of more flexibility to  
add attributes that can then be indexed.  We're still working on the  
search side of it, but I think you will see more in the way of  
flexible indexing in the coming months that should be a nice win for  
UIMA + Lucene users.



> What makes Lucene + UIMA interesting for me is a simple fact: I can do
> all the NLP I want and be as flexible as I need in UIMA. Then I can  
> feed
> the outcome (or rather: a small part of it) into a Lucene index.
>
> In my special case, I'm not using a CAS Consumer, but I can imagine
> other people would appreciate it in their application scenarios.
>
> To conclude: Lucene and UIMA aren't competitors, but in some cases  
> having one feeding the other is what you want.

Couldn't agree more!

Cheers,
Grant

Mime
View raw message