Mailing-List: contact uima-user-help@incubator.apache.org; run by ezmlm
Precedence: bulk
Reply-To: uima-user@incubator.apache.org
Received-SPF: pass (athena.apache.org: local policy)
Content-class: urn:content-classes:message
MIME-Version: 1.0
Content-Type: multipart/mixed;
	boundary="----_=_NextPart_001_01C956B6.3AD81742"
Subject: RE: Lucene cas consumer
Date: Fri, 5 Dec 2008 09:44:56 +0100
Message-ID: <205239E4006D14469A2D6CEC72AE925401B73481@EXV01001.GlobalSP.local>
Thread-Topic: Lucene cas consumer
Thread-Index: AclWQEjFmNpfHCDnTq+Mi6EyYN4BOwAcpNdA
From: "Olivier Terrier" <olivier.terrier@temis.com>
To: <uima-user@incubator.apache.org>
Cc: "Roberto Franchini" <ro.franchini@gmail.com>

------_=_NextPart_001_01C956B6.3AD81742
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

Hi all

We, at Temis, have also made a prototype integration of Lucene and UIMA =
as a proof of concept.
More exactly we have written a Solr Cas consumer.
Solr http://lucene.apache.org/solr/ is a Lucene sub project that provide =
a kind of indexation server layer on top of Lucene.
The idea behind was to be able to index documents using a UIMA =
processing chain with both full-text and entities based on UIMA =
annotations.
More over Solr provides a support for 'faceted search' that can be based =
on annotation.
Let's suppose you have a UIMA typesystem that defines annotations like =
Person, Company, Location etc... You can easily index these entities =
into a lucene index using the Solr java API.
In the prototype we also used a Solr contribution (not already =
integrated in the trunk) names solr-ui available here
https://issues.apache.org/jira/browse/SOLR-634
It provides a simple UI to serach into your indexed documents using a =
combination of full text and facets (look at attached screenshot).
Of course our Solr consumer is for now a very basic piece of code: for =
example it is tightly linked to our own typesystem but we would be more =
than happy to collaborate with the communtiy on this subject if there is =
interest.

Regards

Olivier Terrier
Temis

> -----Message d'origine-----
> De : Niels Ott [mailto:nott@sfs.uni-tuebingen.de]=20
> Envoy=E9 : jeudi 4 d=E9cembre 2008 19:37
> =C0 : uima-user@incubator.apache.org
> Cc : Roberto Franchini
> Objet : Re: Lucene cas consumer
>=20
> Hi all,
>=20
> I'm using both Lucene and UIMA in one project.
>=20
> Lucene is primarily an information retrieval API. It provides=20
> a framework and default implementations for analyzing several=20
> languages.
> Analyzing means tokenization, stop words, etc. Furthermore,=20
> it brings the key functionality to build an inverted index=20
> and to search it.
>=20
> Lucene can be extended easily. E.g. one can implement an=20
> analyzer that does lemmatization or that looks up synonyms in=20
> Wordnet  and adds them to the index.
>=20
> What Lucene cannot do - or at least not without a lot of=20
> hacking - is aggregating analyses as UIMA can using the CAS.=20
> Usually your knowledge grows during an UIMA-based=20
> NLP-pipeline: you add the a token annotation, a lemma=20
> annotation, a POS-annotation and so on...  In Lucene, you=20
> have the classical pipeline: the output replaces the input.=20
> (Yes, by subclassing Lucene's "Token" class, one can fiddle=20
> around the issue, but it is not elegant at all.)
>=20
> What makes Lucene + UIMA interesting for me is a simple fact:=20
> I can do all the NLP I want and be as flexible as I need in=20
> UIMA. Then I can feed the outcome (or rather: a small part of=20
> it) into a Lucene index.
>=20
> In my special case, I'm not using a CAS Consumer, but I can=20
> imagine other people would appreciate it in their application=20
> scenarios.
>=20
> To conclude: Lucene and UIMA aren't competitors, but in some=20
> cases having one feeding the other is what you want.
>=20
> Best,
>=20
>     Niels
>=20
>=20
> Greg Holmberg schrieb:
> > Roberto--
> >=20
> > It does seem like there should be a close relationship=20
> between the two=20
> > groups.
> >=20
> > I don't know much about Lucene--can you educate me?  For=20
> example, have=20
> > you given any thought to what to do with UIMA annotations? =20
> From what=20
> > little I've read about Lucene, they seem to have a thing called a=20
> > document analyzer, but they don't mean the same thing we mean by=20
> > analysis in the NLP community.  They appear to mean something more=20
> > like "tokenizer".  So I haven't yet found a place to put UIMA=20
> > annotations, say for example, named entities or parts of=20
> speech.  I'm=20
> > wondering if Lucene needs a major feature enhancement=20
> before its truly=20
> > useful with UIMA?
> >=20
> > What are your thoughts on how the integrate the two?  What=20
> > functionality is possible?
> >=20
> > Greg Holmberg
> >=20
> >=20
> > -------------- Original message ----------------------=20
> From: "Roberto=20
> > Franchini" <ro.franchini@gmail.com>
> >> Hi, I'm going to write a Lucene CAS consumer. The porpouse is to=20
> >> create a Lucene document, or more than one, for each CAS. Last year
> >> (2007)  the JENA university lab (JULIE lab? is it right?)=20
> delivered=20
> >> such a component, named LUCAS. Then it disappeared. LUCAS seems a=20
> >> good piece of software. The Technische Universit t Darmstadt=20
> >> developed one too: http://www.ukp.tu-darmstadt.de/projects/dkpro/.
> >> (I will write to them).
> >>=20
> >> There's anybody interested to share knowledge and/or code=20
> to do that=20
> >> component? I think that Lucene and UIMA can be very good friends :)
> >>=20
> >> Roberto
> >>=20
> >> PS: I apologize for my bad English.
> >>=20
> >> -- Roberto Franchini http://www.celi.it http://www.blogmeter.it=20
> >> http://www.memesphere.it Tel +39-011-6600814=20
> >> jabber:ro.franchini@gmail.com skype:ro.franchini
>=20
>=20
> --
> Niels Ott - Computational Linguist (B.A.) - http://www.drni.de/niels/
>            - My PGP key is available from your favorite key server.
>=20
> Wer im Glashaus sitzt, sollte immer Sidolin dabei haben!
>=20

------_=_NextPart_001_01C956B6.3AD81742--