uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Diego Buoro <jklpo...@gmail.com>
Subject Re: Marking cosnecutive tokens with RUTA
Date Wed, 03 Jun 2015 17:14:47 GMT
Hi Peter, the example we used is the small sentence inside a string at the
end of UIMAChecker.java: "Refiro-me à trabalho remunerado.".
Based on the Main.ruta we sent you, we expected the output to contain 7
"PROBLEM" annotations. This part is working.
The problem is when we change the last line of Main.ruta from
"cgToken{->PROBLEM};" to "cgToken cgToken{->PROBLEM};"in this case we
expected 6 "PROBLEM" annotations: the same ones we had on the first
example, excpect for the first one.That's what happens when you run the
script on a simple Ruta project, but when we run it in the  Java
application we get 0 "PROBLEM" annotations.
We think this difference is happening because in the Ruta project we don't
use a simple text as input.Instead, we feed it a preprocessed xmi file. On
the other hand on the Java application, we do the processing ourselves via
the processCas method. It's possible that the processCas method is creating
tokens in a way that prevents us from detecting when one is next to the
other on the Ruta script.
We are sending you the xmi file to use as an example for a simple Ruta
project. If there are any other examples you'd like us to send you, just
say the word :D

Best,

Diego

2015-06-01 11:15 GMT-03:00 Diego Buoro <jklports@gmail.com>:

> Sorry,please disregard my last answer. The idea wasn't to use the xmi, we
> are still thinking in a minimal example to provide to you.
> We will send you in the next few days.
>
> 2015-06-01 10:37 GMT-03:00 Diego Buoro <jklports@gmail.com>:
>
>> Hi Peter,how are you doing?
>>
>> We were trying to run using the files such as Crase01.xmi and
>> rule_xml_001.xmi.
>> Our goal is trying to run those two more simpler first,and then run with
>> Crase.xmi.
>>
>> About the package declaration, i still need to check what ruta version is.
>> I will be checking this soon.
>>
>> All Best,
>>
>> Diego
>>
>>
>>
>>
>>
>> 2015-05-30 0:45 GMT-03:00 Diego Buoro <jklports@gmail.com>:
>>
>>> Hi Peter!
>>> No problem, I appreciate your support.
>>>
>>> All Best,
>>>
>>> Diego
>>>
>>> 2015-05-27 14:22 GMT-03:00 Diego Buoro <jklports@gmail.com>:
>>>
>>>> Hi Peter!
>>>> We call the script with the following lines:
>>>>
>>>>  URL url = Resources.getResource("Main.ruta");
>>>> String text = Resources.toString(url, Charsets.UTF_8);
>>>>  AnalysisEngineDescription aeDes =
>>>> Ruta.createAnalysisEngineDescription(text, tsd);
>>>> this.ae = UIMAFramework.produceAnalysisEngine(aeDes);
>>>>
>>>> CAS cas = ae.newCAS();
>>>> converter.populateCas(sentence.getTextSentence(), cas);
>>>>  ae.process(cas);
>>>>
>>>> The populateCAS method is responsible for translating our annotations
>>>> into RUTA annotations, but it doesn't set any type priority explicitly.
>>>> We don't know much about type priorities, the RUTA references we found
>>>> say very little about that.Are they necessary for doing what we need?
>>>>
>>>> The file that contains the above lines is available here:
>>>>
>>>> https://github.com/Fichberg/cogroo4/blob/labXP215_Will/cogroo-gc/src/main/java/org/cogroo/tools/checker/checkers/UIMAChecker.java
>>>> The processCAS mehtod is available here:
>>>>
>>>> https://github.com/Fichberg/cogroo4/blob/labXP215_Will/cogroo-gc/src/main/java/org/cogroo/tools/checker/checkers/uima/UimaCasAdapter.java
>>>> The script we are calling is available here:
>>>>
>>>> https://github.com/Fichberg/cogroo4/blob/labXP215_Will/cogroo-ruta/script/Main.ruta
>>>>
>>>> PS:Yes, We remembered the semicolons.
>>>>
>>>> Thanks for the help :)
>>>>
>>>>
>>>>
>>>> 2015-05-26 15:30 GMT-03:00 Diego Buoro <jklports@gmail.com>:
>>>>
>>>>> I think i wasn't clear enough, and i should be more specific.
>>>>>
>>>>> I have a type system in which all words have been annotated as Tokens.
>>>>> I am calling a RUTA script from a java class, and that script has only
one
>>>>> rule:
>>>>> Token Token {-> Problem}
>>>>>
>>>>> However, with this script, no Problems are created. When I try
>>>>> Token {-> Problem}
>>>>>
>>>>> I get one problem for each Token, which is what I expected. Why can't
>>>>> I create annotations using rules with more than one word?
>>>>>
>>>>> Thanks
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> 2015-05-26 14:49 GMT-03:00 Diego Buoro <jklports@gmail.com>:
>>>>>
>>>>>> Hello guys,how are you doing?
>>>>>>
>>>>>> I would like to know once i have called RUTA from a Java project,
how
>>>>>> can i mark consecutive tokens as a "Problem" (the name of my annotation,
in
>>>>>> this case)?
>>>>>>
>>>>>> Thanks in advice!
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message