uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dr. Armin Wegner" <arminweg...@googlemail.com>
Subject Re: AW: AW: Lucas
Date Thu, 28 Aug 2014 07:21:24 GMT
Hello Erik,

in Lucene 4.9 (maybe earlier), you can replace the Lucene analyzer
with a UIMA pipeline. At least the docs say so. I don't know how good
it is becaus I've never used it.

Cheers,
Armin


On 8/26/14, Erik Fäßler <erik.faessler@uni-jena.de> wrote:
> Hi all,
>
> actually, I don't use LuCas anymore to write a Lucene index but rather to
> send the created documents to Solr or ElasticSearch. There are two reasons I
> continue to use LuCas: It's field merging capabilities and the term cover
> mechanics.
> Regarding the field merging: I have a lot of machine learning components in
> my pipeline, nothing I could do within a Lucene analyzer. So when I
> recognize entities with an ML component in the text and each entity has an
> ID, then please consider this example:
>
> Barack Obama entered the White House.
>
> Let's pretend we would require an ML system to recognize "White House" as
> THE one White House and let's say we gave it the ID "entity1".
> My goal is to be able to search for the ID in the same way I would do using
> a synonym filter, thus finding a document by terms that originally were not
> included in this document's text, AND be able to correctly highlight the
> corresponding text snippet. So, when I search for "entity1" (e.g. because
> the user wants to see documents dealing with the White House), I want to
> find the above example document with the string "Whit House" highlighted.
> LuCas can do this for me be aligning or merging the text TokenStream with
> the entity TokenStream, just as it is done within the CAS itself.
>
> If this functionality can be achieved without using LuCas, please tell me,
> I'd be happy to switch to up-to-date maintained default-components. Until
> now I am under the impression this cannot be done by another component.
>
> The term cover mechanics allow me to easily distribute terms across document
> fields in a predefined, possible overlapping, set division, the set cover. I
> use it to automatically deal with a lot of faceting fields. Here, I can
> model n:n mappings from CAS indexes to Lucene fields, e.g. mapping terms
> originating from one CAS index to 10 Lucene fields, or the other way round.
> Again, if this is easily possible with another existing, maintained
> component, please point me to it.
>
> In short: I, too, ultimately don't use Lucene but Solr/ES. However, LuCas
> has some (Lucene) document fine-tuning-tuning capabilities I need/work
> with.
> This means: I don't necessarily need LuCas in an Lucene-updated version. I
> use it more as a fine-tuned TokenStream-smith. I could require it to be
> updated in the future when LuCas is not able to express a specific feature
> of a newer Lucene version.
>
> I hope this wall of text was understandable, thanks for reading through it
> ;-)
>
> Best,
>
> Erik
>
>
>
>> On 26 Aug 2014, at 09:43, <Armin.Wegner@bka.bund.de> wrote:
>>
>> Hi Erik and Jörn,
>>
>> I've used Solr in the meantime. It is so easy to quickly write a CAS
>> consumer that sends documents to a Solr web service. Writing to a Lucene
>> index is minimally more work. Could this be the reason why nobody cares
>> about the outdated version? Is there really a need for Lucas and Solrcas
>> anymore? What do you think? It would be nice to have some opinions on
>> this.
>>
>> Of all people reading this list, who wants to have a Lucas or Solrcas for
>> the current version of Lucene?
>>
>> Cheers,
>> Armin
>>
>> -----Ursprüngliche Nachricht-----
>> Von: Erik Fäßler [mailto:erik.faessler@uni-jena.de]
>> Gesendet: Freitag, 22. August 2014 16:34
>> An: user@uima.apache.org
>> Betreff: Re: AW: Lucas
>>
>> I am using  LuCas in production in the last SNAPSHOT version that can be
>> found in the SVN but not in the maven repository. I was also not aware a
>> patch would be required to get it to work, I am using it in its current
>> SVN state, including the splitter filter.
>> I would be willing to help with a migration and contribute to
>> discussions/plans. However, I won't have time to do it all on my own,
>> especially since I use it as a bridge to Solr/ElasticSearch that kind of
>> remedies the version difference. Thus I use it with newer Solr/ES versions
>> without problems so far.
>>
>> I will be on vacations for two weeks, after that I'd be available for
>> contributions.
>>
>> Best,
>>
>> Erik
>>
>>> On 22 Aug 2014, at 15:36, Jörn Kottmann <kottmann@gmail.com> wrote:
>>>
>>> It would probably nice to migrate those to the current versions of
>>> Lucene/Solr.
>>>
>>> Jörn
>>>
>>>> On 08/13/2014 08:44 AM, Armin.Wegner@bka.bund.de wrote:
>>>> Hi Renauld,
>>>>
>>>> that's nice, thank you. Are you using Lucene 4.x or an older version?
>>>>
>>>> It's a while ago, that I've asked that question and I didn't get much
>>>> response. Is the project dead? Is it just to easy to code a simple
>>>> annotator for Lucene or Solr to justify the effort maintaining Lucas and
>>>> Solrcas?
>>>>
>>>> Cheers,
>>>> Armin
>>>>
>>>>
>>>> -----Ursprüngliche Nachricht-----
>>>> Von: Renaud Richardet [mailto:renaud.richardet@epfl.ch]
>>>> Gesendet: Montag, 11. August 2014 23:12
>>>> An: user@uima.apache.org
>>>> Betreff: Re: Lucas
>>>>
>>>> Hi Armin,
>>>>
>>>> I used it a while ago. I had to apply the following patch to make it
>>>> work:
>>>> https://gist.github.com/renaud/bc34a48ca22f787f6c11
>>>>
>>>> HTH, Renaud
>>>>
>>>>
>>>>> On Mon, Jul 28, 2014 at 2:55 PM, <Armin.Wegner@bka.bund.de> wrote:
>>>>>
>>>>> Hi!
>>>>>
>>>>> Is someone using Lucas? It seems to be slightly outdated. It depends
>>>>> on Lucene 2.9.3. Lucene is at version 4.9.0 right now. Is there an
>>>>> alternative?
>>>>>
>>>>> Regards,
>>>>> Armin
>>>>
>>>> --
>>>> Renaud Richardet
>>>> Blue Brain Project  PhD candidate
>>>> EPFL  Station 15
>>>> CH-1015 Lausanne
>>>> phone: +41-78-675-9501
>>>> http://people.epfl.ch/renaud.richardet
>>>
>

Mime
View raw message