uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Klügl <peter.klu...@averbis.com>
Subject Re: Marking cosnecutive tokens with RUTA
Date Sun, 14 Jun 2015 12:40:28 GMT
Hi,

the descriptor are always created at compile time.

In Ruta 2.2.1, yes, you need to create the descriptors in the UIMA Ruta 
Workbench and then copy them or make them available in some other way. 
This is especially necessary if you declare additional types (type 
system descriptor changes) or add some subscript (analysis engine 
descriptor changes).

In Ruta 2.3.0 which was just released, there is a maven plugin for 
building the descriptors. Take a look at: 
http://uima.apache.org/d/ruta-current/tools.ruta.book.html#d5e3271
This means that you do not need the UIMA Ruta Workbench projects 
anymore, but you can use its development support and descriptor building 
in normal maven projects.

Best,

Peter

Am 12.06.2015 um 21:38 schrieb Diego Buoro:
> Hello Peter
>
> We tried your suggestions and it worked liked a charm,thanks :D
> However, we are facing another problem: It seems that our application isn't
> creating the mainTypesystem and mainEngine files when we launch it. We
> don't know whether or not that's is the default behavior, but for now we
> are having to create these files in separate project and them copy them to
> the application whenever we change the script, which is a bad solution.
> Doy you have any suggestions?
>
> All Best,
>
> Diego
>
> 2015-06-12 9:19 GMT-03:00 Diego Buoro <jklports@gmail.com>:
>
>> Hi Peter, Armin
>>
>> Thanks for the observations made, i hope we can finally get working here.
>> We will try the changes in the next few days and then give you a feedback.
>>
>> All Best,
>>
>> Diego
>>
>>
>>
>> 2015-06-03 14:14 GMT-03:00 Diego Buoro <jklports@gmail.com>:
>>
>>> Hi Peter, the example we used is the small sentence inside a string at
>>> the end of UIMAChecker.java: "Refiro-me à trabalho remunerado.".
>>> Based on the Main.ruta we sent you, we expected the output to contain 7
>>> "PROBLEM" annotations. This part is working.
>>> The problem is when we change the last line of Main.ruta from
>>> "cgToken{->PROBLEM};" to "cgToken cgToken{->PROBLEM};"in this case we
>>> expected 6 "PROBLEM" annotations: the same ones we had on the first
>>> example, excpect for the first one.That's what happens when you run the
>>> script on a simple Ruta project, but when we run it in the  Java
>>> application we get 0 "PROBLEM" annotations.
>>> We think this difference is happening because in the Ruta project we
>>> don't use a simple text as input.Instead, we feed it a preprocessed xmi
>>> file. On the other hand on the Java application, we do the processing
>>> ourselves via the processCas method. It's possible that the processCas
>>> method is creating tokens in a way that prevents us from detecting when one
>>> is next to the other on the Ruta script.
>>> We are sending you the xmi file to use as an example for a simple Ruta
>>> project. If there are any other examples you'd like us to send you, just
>>> say the word :D
>>>
>>> Best,
>>>
>>> Diego
>>>
>>> 2015-06-01 11:15 GMT-03:00 Diego Buoro <jklports@gmail.com>:
>>>
>>>> Sorry,please disregard my last answer. The idea wasn't to use the xmi,
>>>> we are still thinking in a minimal example to provide to you.
>>>> We will send you in the next few days.
>>>>
>>>> 2015-06-01 10:37 GMT-03:00 Diego Buoro <jklports@gmail.com>:
>>>>
>>>>> Hi Peter,how are you doing?
>>>>>
>>>>> We were trying to run using the files such as Crase01.xmi and
>>>>> rule_xml_001.xmi.
>>>>> Our goal is trying to run those two more simpler first,and then run
>>>>> with Crase.xmi.
>>>>>
>>>>> About the package declaration, i still need to check what ruta version
>>>>> is.
>>>>> I will be checking this soon.
>>>>>
>>>>> All Best,
>>>>>
>>>>> Diego
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> 2015-05-30 0:45 GMT-03:00 Diego Buoro <jklports@gmail.com>:
>>>>>
>>>>>> Hi Peter!
>>>>>> No problem, I appreciate your support.
>>>>>>
>>>>>> All Best,
>>>>>>
>>>>>> Diego
>>>>>>
>>>>>> 2015-05-27 14:22 GMT-03:00 Diego Buoro <jklports@gmail.com>:
>>>>>>
>>>>>>> Hi Peter!
>>>>>>> We call the script with the following lines:
>>>>>>>
>>>>>>>   URL url = Resources.getResource("Main.ruta");
>>>>>>> String text = Resources.toString(url, Charsets.UTF_8);
>>>>>>>   AnalysisEngineDescription aeDes =
>>>>>>> Ruta.createAnalysisEngineDescription(text, tsd);
>>>>>>> this.ae = UIMAFramework.produceAnalysisEngine(aeDes);
>>>>>>>
>>>>>>> CAS cas = ae.newCAS();
>>>>>>> converter.populateCas(sentence.getTextSentence(), cas);
>>>>>>>   ae.process(cas);
>>>>>>>
>>>>>>> The populateCAS method is responsible for translating our annotations
>>>>>>> into RUTA annotations, but it doesn't set any type priority explicitly.
>>>>>>> We don't know much about type priorities, the RUTA references
we
>>>>>>> found say very little about that.Are they necessary for doing
what we need?
>>>>>>>
>>>>>>> The file that contains the above lines is available here:
>>>>>>>
>>>>>>> https://github.com/Fichberg/cogroo4/blob/labXP215_Will/cogroo-gc/src/main/java/org/cogroo/tools/checker/checkers/UIMAChecker.java
>>>>>>> The processCAS mehtod is available here:
>>>>>>>
>>>>>>> https://github.com/Fichberg/cogroo4/blob/labXP215_Will/cogroo-gc/src/main/java/org/cogroo/tools/checker/checkers/uima/UimaCasAdapter.java
>>>>>>> The script we are calling is available here:
>>>>>>>
>>>>>>> https://github.com/Fichberg/cogroo4/blob/labXP215_Will/cogroo-ruta/script/Main.ruta
>>>>>>>
>>>>>>> PS:Yes, We remembered the semicolons.
>>>>>>>
>>>>>>> Thanks for the help :)
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> 2015-05-26 15:30 GMT-03:00 Diego Buoro <jklports@gmail.com>:
>>>>>>>
>>>>>>>> I think i wasn't clear enough, and i should be more specific.
>>>>>>>>
>>>>>>>> I have a type system in which all words have been annotated
as
>>>>>>>> Tokens. I am calling a RUTA script from a java class, and
that script has
>>>>>>>> only one rule:
>>>>>>>> Token Token {-> Problem}
>>>>>>>>
>>>>>>>> However, with this script, no Problems are created. When
I try
>>>>>>>> Token {-> Problem}
>>>>>>>>
>>>>>>>> I get one problem for each Token, which is what I expected.
Why
>>>>>>>> can't I create annotations using rules with more than one
word?
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> 2015-05-26 14:49 GMT-03:00 Diego Buoro <jklports@gmail.com>:
>>>>>>>>
>>>>>>>>> Hello guys,how are you doing?
>>>>>>>>>
>>>>>>>>> I would like to know once i have called RUTA from a Java
project,
>>>>>>>>> how can i mark consecutive tokens as a "Problem" (the
name of my
>>>>>>>>> annotation, in this case)?
>>>>>>>>>
>>>>>>>>> Thanks in advice!
>>>>>>>>>
>>>>>>>>


Mime
View raw message