uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From <Armin.Weg...@bka.bund.de>
Subject AW: Marking cosnecutive tokens with RUTA
Date Thu, 11 Jun 2015 06:38:51 GMT

yeah, that once hit me, too. It has something to do with the internal sorting of annotations
with the same start offset. I annotated some meta data for the whole document in an annotation
with start offset 0 and end offset 0. That's not good. The end offset must be the length of
the document text. It's fine then.


-----Ursprüngliche Nachricht-----
Von: Peter Klügl [mailto:peter.kluegl@averbis.com] 
Gesendet: Mittwoch, 10. Juni 2015 21:28
An: user@uima.apache.org
Betreff: Re: Marking cosnecutive tokens with RUTA


here are the results of my investigations:

- the text of the document is not set directly. You should add something 
like cas.setDocumentText(sentence.getDocumentText()); before populating 
the CAS in your method. Otherwise there will be a DocumentAnnotation of 
length 0. Ruta does not like these... that's the source of the problem. 
If you add the line, or avoid size length annotations somehow, then the 
rules should work just fine.

- I'd rather use tcas.addFsToIndexes(sentenceAnn); instead of 
tcas.getIndexRepository().addFS(sentenceAnn); (but that shouldn't change 

- You access the problem type "cogroo.ruta.Base.PROBLEM", but the rules 
seem to use the type "Main.PROBLEM"



Am 03.06.2015 um 19:14 schrieb Diego Buoro:
> Hi Peter, the example we used is the small sentence inside a string at 
> the end of UIMAChecker.java: "Refiro-me à trabalho remunerado.".
> Based on the Main.ruta we sent you, we expected the output to contain 
> 7 "PROBLEM" annotations. This part is working.
> The problem is when we change the last line of Main.ruta from 
> "cgToken{->PROBLEM};" to "cgToken cgToken{->PROBLEM};"in this case we 
> expected 6 "PROBLEM" annotations: the same ones we had on the first 
> example, excpect for the first one.That's what happens when you run 
> the script on a simple Ruta project, but when we run it in the  Java 
> application we get 0 "PROBLEM" annotations.
> We think this difference is happening because in the Ruta project we 
> don't use a simple text as input.Instead, we feed it a preprocessed 
> xmi file. On the other hand on the Java application, we do the 
> processing ourselves via the processCas method. It's possible that the 
> processCas method is creating tokens in a way that prevents us from 
> detecting when one is next to the other on the Ruta script.
> We are sending you the xmi file to use as an example for a simple Ruta 
> project. If there are any other examples you'd like us to send you, 
> just say the word :D
> Best,
> Diego
> 2015-06-01 11:15 GMT-03:00 Diego Buoro <jklports@gmail.com 
> <mailto:jklports@gmail.com>>:
>     Sorry,please disregard my last answer. The idea wasn't to use the
>     xmi, we are still thinking in a minimal example to provide to you.
>     We will send you in the next few days.
>     2015-06-01 10:37 GMT-03:00 Diego Buoro <jklports@gmail.com
>     <mailto:jklports@gmail.com>>:
>         Hi Peter,how are you doing?
>         We were trying to run using the files such as Crase01.xmi and 
>         rule_xml_001.xmi.
>         Our goal is trying to run those two more simpler first,and
>         then run with Crase.xmi.
>         About the package declaration, i still need to check what ruta
>         version is.
>         I will be checking this soon.
>         All Best,
>         Diego
>         2015-05-30 0:45 GMT-03:00 Diego Buoro <jklports@gmail.com
>         <mailto:jklports@gmail.com>>:
>             Hi Peter!
>             No problem, I appreciate your support.
>             All Best,
>             Diego
>             2015-05-27 14:22 GMT-03:00 Diego Buoro <jklports@gmail.com
>             <mailto:jklports@gmail.com>>:
>                 Hi Peter!
>                 We call the script with the following lines:
>                  URL url = Resources.getResource("Main.ruta");
>                 String text = Resources.toString(url, Charsets.UTF_8);
>                  AnalysisEngineDescription aeDes =
>                 Ruta.createAnalysisEngineDescription(text, tsd);
>                 this.ae <http://this.ae> =
>                 UIMAFramework.produceAnalysisEngine(aeDes);
>                 CAS cas = ae.newCAS();
>                 converter.populateCas(sentence.getTextSentence(), cas);
>                  ae.process(cas);
>                 The populateCAS method is responsible for translating
>                 our annotations into RUTA annotations, but it doesn't
>                 set any type priority explicitly.
>                 We don't know much about type priorities, the RUTA
>                 references we found say very little about that.Are
>                 they necessary for doing what we need?
>                 The file that contains the above lines is available here:
>                 https://github.com/Fichberg/cogroo4/blob/labXP215_Will/cogroo-gc/src/main/java/org/cogroo/tools/checker/checkers/UIMAChecker.java
>                 The processCAS mehtod is available here:
>                 https://github.com/Fichberg/cogroo4/blob/labXP215_Will/cogroo-gc/src/main/java/org/cogroo/tools/checker/checkers/uima/UimaCasAdapter.java
>                 The script we are calling is available here:
>                 https://github.com/Fichberg/cogroo4/blob/labXP215_Will/cogroo-ruta/script/Main.ruta
>                 PS:Yes, We remembered the semicolons.
>                 Thanks for the help :)
>                 2015-05-26 15:30 GMT-03:00 Diego Buoro
>                 <jklports@gmail.com <mailto:jklports@gmail.com>>:
>                     I think i wasn't clear enough, and i should be
>                     more specific.
>                     I have a type system in which all words have been
>                     annotated as Tokens. I am calling a RUTA script
>                     from a java class, and that script has only one rule:
>                     Token Token {-> Problem}
>                     However, with this script, no Problems are
>                     created. When I try
>                     Token {-> Problem}
>                     I get one problem for each Token, which is what I
>                     expected. Why can't I create annotations using
>                     rules with more than one word?
>                     Thanks
>                     2015-05-26 14:49 GMT-03:00 Diego Buoro
>                     <jklports@gmail.com <mailto:jklports@gmail.com>>:
>                         Hello guys,how are you doing?
>                         I would like to know once i have called RUTA
>                         from a Java project, how can i mark
>                         consecutive tokens as a "Problem" (the name of
>                         my annotation, in this case)?
>                         Thanks in advice!

View raw message