uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Klügl <peter.klu...@averbis.com>
Subject Re: Marking cosnecutive tokens with RUTA
Date Thu, 11 Jun 2015 08:57:41 GMT
I assumed that these zero-length annotations do not cause problems
anymore... I was wrong and I should do something about it. Either they
will really be ignored completely now or I need to change the sequential
matching so that they will be consumed somehow. If anyone is interested
I would explain the problems and indications in more detail in a new
jira issue.

Best,

Peter

Am 11.06.2015 um 08:38 schrieb Armin.Wegner@bka.bund.de:
> Hi,
>
> yeah, that once hit me, too. It has something to do with the internal sorting of annotations
with the same start offset. I annotated some meta data for the whole document in an annotation
with start offset 0 and end offset 0. That's not good. The end offset must be the length of
the document text. It's fine then.
>
> Cheers,
> Armin
>
> -----Ursprüngliche Nachricht-----
> Von: Peter Klügl [mailto:peter.kluegl@averbis.com] 
> Gesendet: Mittwoch, 10. Juni 2015 21:28
> An: user@uima.apache.org
> Betreff: Re: Marking cosnecutive tokens with RUTA
>
> Hi,
>
> here are the results of my investigations:
>
> - the text of the document is not set directly. You should add something 
> like cas.setDocumentText(sentence.getDocumentText()); before populating 
> the CAS in your method. Otherwise there will be a DocumentAnnotation of 
> length 0. Ruta does not like these... that's the source of the problem. 
> If you add the line, or avoid size length annotations somehow, then the 
> rules should work just fine.
>
> - I'd rather use tcas.addFsToIndexes(sentenceAnn); instead of 
> tcas.getIndexRepository().addFS(sentenceAnn); (but that shouldn't change 
> anything)
>
> - You access the problem type "cogroo.ruta.Base.PROBLEM", but the rules 
> seem to use the type "Main.PROBLEM"
>
> Best,
>
> Peter
>
>
> Am 03.06.2015 um 19:14 schrieb Diego Buoro:
>> Hi Peter, the example we used is the small sentence inside a string at 
>> the end of UIMAChecker.java: "Refiro-me à trabalho remunerado.".
>> Based on the Main.ruta we sent you, we expected the output to contain 
>> 7 "PROBLEM" annotations. This part is working.
>> The problem is when we change the last line of Main.ruta from 
>> "cgToken{->PROBLEM};" to "cgToken cgToken{->PROBLEM};"in this case we 
>> expected 6 "PROBLEM" annotations: the same ones we had on the first 
>> example, excpect for the first one.That's what happens when you run 
>> the script on a simple Ruta project, but when we run it in the  Java 
>> application we get 0 "PROBLEM" annotations.
>> We think this difference is happening because in the Ruta project we 
>> don't use a simple text as input.Instead, we feed it a preprocessed 
>> xmi file. On the other hand on the Java application, we do the 
>> processing ourselves via the processCas method. It's possible that the 
>> processCas method is creating tokens in a way that prevents us from 
>> detecting when one is next to the other on the Ruta script.
>> We are sending you the xmi file to use as an example for a simple Ruta 
>> project. If there are any other examples you'd like us to send you, 
>> just say the word :D
>>
>> Best,
>>
>> Diego
>>
>> 2015-06-01 11:15 GMT-03:00 Diego Buoro <jklports@gmail.com 
>> <mailto:jklports@gmail.com>>:
>>
>>     Sorry,please disregard my last answer. The idea wasn't to use the
>>     xmi, we are still thinking in a minimal example to provide to you.
>>     We will send you in the next few days.
>>
>>     2015-06-01 10:37 GMT-03:00 Diego Buoro <jklports@gmail.com
>>     <mailto:jklports@gmail.com>>:
>>
>>         Hi Peter,how are you doing?
>>
>>         We were trying to run using the files such as Crase01.xmi and 
>>         rule_xml_001.xmi.
>>         Our goal is trying to run those two more simpler first,and
>>         then run with Crase.xmi.
>>
>>         About the package declaration, i still need to check what ruta
>>         version is.
>>         I will be checking this soon.
>>
>>         All Best,
>>
>>         Diego
>>
>>
>>
>>
>>
>>         2015-05-30 0:45 GMT-03:00 Diego Buoro <jklports@gmail.com
>>         <mailto:jklports@gmail.com>>:
>>
>>             Hi Peter!
>>             No problem, I appreciate your support.
>>
>>             All Best,
>>
>>             Diego
>>
>>             2015-05-27 14:22 GMT-03:00 Diego Buoro <jklports@gmail.com
>>             <mailto:jklports@gmail.com>>:
>>
>>                 Hi Peter!
>>                 We call the script with the following lines:
>>
>>                  URL url = Resources.getResource("Main.ruta");
>>                 String text = Resources.toString(url, Charsets.UTF_8);
>>                  AnalysisEngineDescription aeDes =
>>                 Ruta.createAnalysisEngineDescription(text, tsd);
>>                 this.ae <http://this.ae> =
>>                 UIMAFramework.produceAnalysisEngine(aeDes);
>>
>>                 CAS cas = ae.newCAS();
>>                 converter.populateCas(sentence.getTextSentence(), cas);
>>                  ae.process(cas);
>>
>>                 The populateCAS method is responsible for translating
>>                 our annotations into RUTA annotations, but it doesn't
>>                 set any type priority explicitly.
>>                 We don't know much about type priorities, the RUTA
>>                 references we found say very little about that.Are
>>                 they necessary for doing what we need?
>>
>>                 The file that contains the above lines is available here:
>>                 https://github.com/Fichberg/cogroo4/blob/labXP215_Will/cogroo-gc/src/main/java/org/cogroo/tools/checker/checkers/UIMAChecker.java
>>                 The processCAS mehtod is available here:
>>                 https://github.com/Fichberg/cogroo4/blob/labXP215_Will/cogroo-gc/src/main/java/org/cogroo/tools/checker/checkers/uima/UimaCasAdapter.java
>>                 The script we are calling is available here:
>>                 https://github.com/Fichberg/cogroo4/blob/labXP215_Will/cogroo-ruta/script/Main.ruta
>>
>>                 PS:Yes, We remembered the semicolons.
>>
>>                 Thanks for the help :)
>>
>>
>>
>>                 2015-05-26 15:30 GMT-03:00 Diego Buoro
>>                 <jklports@gmail.com <mailto:jklports@gmail.com>>:
>>
>>                     I think i wasn't clear enough, and i should be
>>                     more specific.
>>
>>                     I have a type system in which all words have been
>>                     annotated as Tokens. I am calling a RUTA script
>>                     from a java class, and that script has only one rule:
>>                     Token Token {-> Problem}
>>
>>                     However, with this script, no Problems are
>>                     created. When I try
>>                     Token {-> Problem}
>>
>>                     I get one problem for each Token, which is what I
>>                     expected. Why can't I create annotations using
>>                     rules with more than one word?
>>
>>                     Thanks
>>
>>
>>
>>
>>                     2015-05-26 14:49 GMT-03:00 Diego Buoro
>>                     <jklports@gmail.com <mailto:jklports@gmail.com>>:
>>
>>                         Hello guys,how are you doing?
>>
>>                         I would like to know once i have called RUTA
>>                         from a Java project, how can i mark
>>                         consecutive tokens as a "Problem" (the name of
>>                         my annotation, in this case)?
>>
>>                         Thanks in advice!
>>
>>
>>
>>
>>
>>
>>


Mime
View raw message