uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Fäßler <erik.faess...@uni-jena.de>
Subject Re: AW: AW: Lucas
Date Tue, 26 Aug 2014 13:58:41 GMT
Hi all,

actually, I don't use LuCas anymore to write a Lucene index but rather to send the created
documents to Solr or ElasticSearch. There are two reasons I continue to use LuCas: It's field
merging capabilities and the term cover mechanics.
Regarding the field merging: I have a lot of machine learning components in my pipeline, nothing
I could do within a Lucene analyzer. So when I recognize entities with an ML component in
the text and each entity has an ID, then please consider this example:

Barack Obama entered the White House.

Let's pretend we would require an ML system to recognize "White House" as THE one White House
and let's say we gave it the ID "entity1".
My goal is to be able to search for the ID in the same way I would do using a synonym filter,
thus finding a document by terms that originally were not included in this document's text,
AND be able to correctly highlight the corresponding text snippet. So, when I search for "entity1"
(e.g. because the user wants to see documents dealing with the White House), I want to find
the above example document with the string "Whit House" highlighted.
LuCas can do this for me be aligning or merging the text TokenStream with the entity TokenStream,
just as it is done within the CAS itself.

If this functionality can be achieved without using LuCas, please tell me, I'd be happy to
switch to up-to-date maintained default-components. Until now I am under the impression this
cannot be done by another component.

The term cover mechanics allow me to easily distribute terms across document fields in a predefined,
possible overlapping, set division, the set cover. I use it to automatically deal with a lot
of faceting fields. Here, I can model n:n mappings from CAS indexes to Lucene fields, e.g.
mapping terms originating from one CAS index to 10 Lucene fields, or the other way round.
Again, if this is easily possible with another existing, maintained component, please point
me to it.

In short: I, too, ultimately don't use Lucene but Solr/ES. However, LuCas has some (Lucene)
document fine-tuning-tuning capabilities I need/work with.
This means: I don't necessarily need LuCas in an Lucene-updated version. I use it more as
a fine-tuned TokenStream-smith. I could require it to be updated in the future when LuCas
is not able to express a specific feature of a newer Lucene version.

I hope this wall of text was understandable, thanks for reading through it ;-)



> On 26 Aug 2014, at 09:43, <Armin.Wegner@bka.bund.de> wrote:
> Hi Erik and Jörn,
> I've used Solr in the meantime. It is so easy to quickly write a CAS consumer that sends
documents to a Solr web service. Writing to a Lucene index is minimally more work. Could this
be the reason why nobody cares about the outdated version? Is there really a need for Lucas
and Solrcas anymore? What do you think? It would be nice to have some opinions on this. 
> Of all people reading this list, who wants to have a Lucas or Solrcas for the current
version of Lucene?
> Cheers,
> Armin
> -----Ursprüngliche Nachricht-----
> Von: Erik Fäßler [mailto:erik.faessler@uni-jena.de] 
> Gesendet: Freitag, 22. August 2014 16:34
> An: user@uima.apache.org
> Betreff: Re: AW: Lucas
> I am using  LuCas in production in the last SNAPSHOT version that can be found in the
SVN but not in the maven repository. I was also not aware a patch would be required to get
it to work, I am using it in its current SVN state, including the splitter filter.
> I would be willing to help with a migration and contribute to discussions/plans. However,
I won't have time to do it all on my own, especially since I use it as a bridge to Solr/ElasticSearch
that kind of remedies the version difference. Thus I use it with newer Solr/ES versions without
problems so far.
> I will be on vacations for two weeks, after that I'd be available for contributions.
> Best,
> Erik
>> On 22 Aug 2014, at 15:36, Jörn Kottmann <kottmann@gmail.com> wrote:
>> It would probably nice to migrate those to the current versions of Lucene/Solr.
>> Jörn
>>> On 08/13/2014 08:44 AM, Armin.Wegner@bka.bund.de wrote:
>>> Hi Renauld,
>>> that's nice, thank you. Are you using Lucene 4.x or an older version?
>>> It's a while ago, that I've asked that question and I didn't get much response.
Is the project dead? Is it just to easy to code a simple annotator for Lucene or Solr to justify
the effort maintaining Lucas and Solrcas?
>>> Cheers,
>>> Armin
>>> -----Ursprüngliche Nachricht-----
>>> Von: Renaud Richardet [mailto:renaud.richardet@epfl.ch]
>>> Gesendet: Montag, 11. August 2014 23:12
>>> An: user@uima.apache.org
>>> Betreff: Re: Lucas
>>> Hi Armin,
>>> I used it a while ago. I had to apply the following patch to make it work:
>>> https://gist.github.com/renaud/bc34a48ca22f787f6c11
>>> HTH, Renaud
>>>> On Mon, Jul 28, 2014 at 2:55 PM, <Armin.Wegner@bka.bund.de> wrote:
>>>> Hi!
>>>> Is someone using Lucas? It seems to be slightly outdated. It depends 
>>>> on Lucene 2.9.3. Lucene is at version 4.9.0 right now. Is there an alternative?
>>>> Regards,
>>>> Armin
>>> --
>>> Renaud Richardet
>>> Blue Brain Project  PhD candidate
>>> EPFL  Station 15
>>> CH-1015 Lausanne
>>> phone: +41-78-675-9501
>>> http://people.epfl.ch/renaud.richardet

View raw message