incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thilo Goetz <twgo...@gmx.de>
Subject Re: Proposal for a new incubation project: Unstructured Information Management Architecture - UIMA
Date Sun, 27 Aug 2006 08:24:26 GMT
Yonik Seeley wrote:
> On 8/26/06, Thilo Goetz <twgoetz@gmx.de> wrote:
>>  From an application perspective, we have great hopes for a cooperation
>> with the Lucene project.
> 
> Great, I think this is something I'd like to get involved in!
> I've been thinking about how Solr integration could work.
> 
>> You then also need a search engine that
>> can index that extra information and make it available for search.
> 
> Without getting into too much detail here, some info could be
> immediately usable by Lucene based apps (like entity extraction, where
> you can add info via a new field in the document).  Parts-of-speech
> type of stuff is currently more difficult of course.
> 
> -Yonik

I agree (with all of the above ;-).  Where it gets really interesting is 
with queries like "show me all documents with book references whose 
author's last name is Knuth (highlighting the reference in the 
summary)."  One might be able to create such a system based on a text 
search engine with special fields and some sophisticated query 
expansion, but it would be a lot easier if one had special support for 
"embedded structures" in the index -- like you need for XML indexing.

I'll be happy to continue this discussion over on solr-dev or wherever 
is appropriate.

--Thilo


---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org


Mime
View raw message