Mailing-List: contact user-help@uima.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@uima.apache.org
Received-SPF: pass (nike.apache.org: domain of arminwegner@googlemail.com
 designates 74.125.82.47 as permitted sender)
MIME-Version: 1.0
In-Reply-To: <CA98C3C2-F4F9-475B-873F-DED60041A0C5@uni-jena.de>
References: <A734E21E566ADB40A70F2EBC5D3C2EB01581C47D@SWMMBX12.bk.bka.bund.de>
	<CAK45zkuugCJtDfHktoo6O9GZbG1Ev7ggm+iPFDEr548e94ZSxA@mail.gmail.com>
	<A734E21E566ADB40A70F2EBC5D3C2EB015829FC6@SWMMBX12.bk.bka.bund.de>
	<53F74764.40306@gmail.com>
	<317F1011-4213-4F13-A6E6-FB6D2EDC1C8E@uni-jena.de>
	<A734E21E566ADB40A70F2EBC5D3C2EB01582A312@SWMMBX12.bk.bka.bund.de>
	<CA98C3C2-F4F9-475B-873F-DED60041A0C5@uni-jena.de>
Date: Thu, 28 Aug 2014 09:21:24 +0200
Message-ID: 
 <CAMofaGSsiqEbSHmWn0nH8LzAyiTzdPNOpWgqEnddwPCt4vffKQ@mail.gmail.com>
Subject: Re: AW: AW: Lucas
From: "Dr. Armin Wegner" <arminwegner@googlemail.com>
To: user@uima.apache.org
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Hello Erik,

in Lucene 4.9 (maybe earlier), you can replace the Lucene analyzer
with a UIMA pipeline. At least the docs say so. I don't know how good
it is becaus I've never used it.

Cheers,
Armin


On 8/26/14, Erik F=C3=A4=C3=9Fler <erik.faessler@uni-jena.de> wrote:
> Hi all,
>
> actually, I don't use LuCas anymore to write a Lucene index but rather to
> send the created documents to Solr or ElasticSearch. There are two reason=
s I
> continue to use LuCas: It's field merging capabilities and the term cover
> mechanics.
> Regarding the field merging: I have a lot of machine learning components =
in
> my pipeline, nothing I could do within a Lucene analyzer. So when I
> recognize entities with an ML component in the text and each entity has a=
n
> ID, then please consider this example:
>
> Barack Obama entered the White House.
>
> Let's pretend we would require an ML system to recognize "White House" as
> THE one White House and let's say we gave it the ID "entity1".
> My goal is to be able to search for the ID in the same way I would do usi=
ng
> a synonym filter, thus finding a document by terms that originally were n=
ot
> included in this document's text, AND be able to correctly highlight the
> corresponding text snippet. So, when I search for "entity1" (e.g. because
> the user wants to see documents dealing with the White House), I want to
> find the above example document with the string "Whit House" highlighted.
> LuCas can do this for me be aligning or merging the text TokenStream with
> the entity TokenStream, just as it is done within the CAS itself.
>
> If this functionality can be achieved without using LuCas, please tell me=
,
> I'd be happy to switch to up-to-date maintained default-components. Until
> now I am under the impression this cannot be done by another component.
>
> The term cover mechanics allow me to easily distribute terms across docum=
ent
> fields in a predefined, possible overlapping, set division, the set cover=
. I
> use it to automatically deal with a lot of faceting fields. Here, I can
> model n:n mappings from CAS indexes to Lucene fields, e.g. mapping terms
> originating from one CAS index to 10 Lucene fields, or the other way roun=
d.
> Again, if this is easily possible with another existing, maintained
> component, please point me to it.
>
> In short: I, too, ultimately don't use Lucene but Solr/ES. However, LuCas
> has some (Lucene) document fine-tuning-tuning capabilities I need/work
> with.
> This means: I don't necessarily need LuCas in an Lucene-updated version. =
I
> use it more as a fine-tuned TokenStream-smith. I could require it to be
> updated in the future when LuCas is not able to express a specific featur=
e
> of a newer Lucene version.
>
> I hope this wall of text was understandable, thanks for reading through i=
t
> ;-)
>
> Best,
>
> Erik
>
>
>
>> On 26 Aug 2014, at 09:43, <Armin.Wegner@bka.bund.de> wrote:
>>
>> Hi Erik and J=C3=B6rn,
>>
>> I've used Solr in the meantime. It is so easy to quickly write a CAS
>> consumer that sends documents to a Solr web service. Writing to a Lucene
>> index is minimally more work. Could this be the reason why nobody cares
>> about the outdated version? Is there really a need for Lucas and Solrcas
>> anymore? What do you think? It would be nice to have some opinions on
>> this.
>>
>> Of all people reading this list, who wants to have a Lucas or Solrcas fo=
r
>> the current version of Lucene?
>>
>> Cheers,
>> Armin
>>
>> -----Urspr=C3=BCngliche Nachricht-----
>> Von: Erik F=C3=A4=C3=9Fler [mailto:erik.faessler@uni-jena.de]
>> Gesendet: Freitag, 22. August 2014 16:34
>> An: user@uima.apache.org
>> Betreff: Re: AW: Lucas
>>
>> I am using  LuCas in production in the last SNAPSHOT version that can be
>> found in the SVN but not in the maven repository. I was also not aware a
>> patch would be required to get it to work, I am using it in its current
>> SVN state, including the splitter filter.
>> I would be willing to help with a migration and contribute to
>> discussions/plans. However, I won't have time to do it all on my own,
>> especially since I use it as a bridge to Solr/ElasticSearch that kind of
>> remedies the version difference. Thus I use it with newer Solr/ES versio=
ns
>> without problems so far.
>>
>> I will be on vacations for two weeks, after that I'd be available for
>> contributions.
>>
>> Best,
>>
>> Erik
>>
>>> On 22 Aug 2014, at 15:36, J=C3=B6rn Kottmann <kottmann@gmail.com> wrote=
:
>>>
>>> It would probably nice to migrate those to the current versions of
>>> Lucene/Solr.
>>>
>>> J=C3=B6rn
>>>
>>>> On 08/13/2014 08:44 AM, Armin.Wegner@bka.bund.de wrote:
>>>> Hi Renauld,
>>>>
>>>> that's nice, thank you. Are you using Lucene 4.x or an older version?
>>>>
>>>> It's a while ago, that I've asked that question and I didn't get much
>>>> response. Is the project dead? Is it just to easy to code a simple
>>>> annotator for Lucene or Solr to justify the effort maintaining Lucas a=
nd
>>>> Solrcas?
>>>>
>>>> Cheers,
>>>> Armin
>>>>
>>>>
>>>> -----Urspr=C3=BCngliche Nachricht-----
>>>> Von: Renaud Richardet [mailto:renaud.richardet@epfl.ch]
>>>> Gesendet: Montag, 11. August 2014 23:12
>>>> An: user@uima.apache.org
>>>> Betreff: Re: Lucas
>>>>
>>>> Hi Armin,
>>>>
>>>> I used it a while ago. I had to apply the following patch to make it
>>>> work:
>>>> https://gist.github.com/renaud/bc34a48ca22f787f6c11
>>>>
>>>> HTH, Renaud
>>>>
>>>>
>>>>> On Mon, Jul 28, 2014 at 2:55 PM, <Armin.Wegner@bka.bund.de> wrote:
>>>>>
>>>>> Hi!
>>>>>
>>>>> Is someone using Lucas? It seems to be slightly outdated. It depends
>>>>> on Lucene 2.9.3. Lucene is at version 4.9.0 right now. Is there an
>>>>> alternative?
>>>>>
>>>>> Regards,
>>>>> Armin
>>>>
>>>> --
>>>> Renaud Richardet
>>>> Blue Brain Project  PhD candidate
>>>> EPFL  Station 15
>>>> CH-1015 Lausanne
>>>> phone: +41-78-675-9501
>>>> http://people.epfl.ch/renaud.richardet
>>>
>