uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Klügl <peter.klu...@averbis.com>
Subject Re: Ruta conflicts with DKPro typesystem
Date Tue, 11 Apr 2017 09:24:09 GMT
Hi,


Am 10.04.2017 um 22:51 schrieb Hugues de Mazancourt:
> Hi Peter,
>
> strictimports perfectly solves the problem, tank you.
> The advantage over « renaming » the annotation is that it allows to keep syntax highlighting
in  Ruta Workbench.

Do you mean the highlighting for being a seeding type (bold grey), or
semantic highlighting for usage in a script?



>
> Your answer raises another question: you wrote 
>> If you create the CAS with uimaFIT,
>> then there are also types that are not imported in you script. Well, you
>> would not even need to import the types in order to use them in your script.
> I did create the CAS with uimaFIT. What kind of types are not imported in the script
?

It's about what the import statement really do.

By default, strictImport is deactivated. Here, the mentions of type
references are resolved against the type system of the CAS given to the
RutaEngine. This means that the import statements do not matter at all.
They could not change the type system in the CAS anyway during
processing the CAS. You could write rules using DKPro Core types in your
rules without importing any type system since the DKPro Core types are
included in the type system of the CAS because you created the CAS using
uimaFIT, and uimaFIT adds the type system with its classpath scanning
functionality (of course only if the type systems and types.txt are in
the classpath). Initially, the import statements had two reasons: the
editor needs to know which types mentions are valid (in simple ruta
projects there is no uimaFIT classpath scanning) for syntax checking,
and the generated descriptors need to import the imported type system
descriptors in order to be valid, i.e. that valid analysis engines and
CASes can be created using the generated analysis engine description. In
the unit tests of ruta-core, for example, you hardly find import
statements because they do not matter for the test. When strictImport
was introduced, the import statements got a real meaning for the rule
inference.

Best,

Peter

>
> Best,
>
> — Hugues
>
>
>
>> Le 10 avr. 2017 à 12:25, Peter Klügl <peter.kluegl@averbis.com> a écrit
:
>>
>> Hi,
>>
>>
>> there are two options to avoid ambiguous references to types by using
>> their shot name.
>>
>>
>> This first one is using an alias as you did. However, you have to assign
>> an unambiguous alias. Ruta should check if the alias is ambiguous but
>> obviously doesn't. Try something like:
>>
>> IMPORT org.apache.uima.ruta.type.NUM FROM
>> org.apache.uima.ruta.engine.BasicTypeSystem AS RutaNum;
>>
>> Then you can use "RutaNum" for referencing to
>> org.apache.uima.ruta.type.NUM in your rules.
>>
>>
>> ... or something like IMPORT PACKAGE org.apache.uima.ruta.type FROM
>> org.apache.uima.ruta.engine.BasicTypeSystem AS ruta;
>>
>> ... then you should be able to use ruta.NUM in your rules.
>>
>>
>> (I did not test both examples)
>>
>>
>> The second option is to activate the "strictImports" configuration
>> parameter. If activated, the type expressions, e.g., by short name, are
>> only resolved against the types that are imported. Thus, if you do not
>> import the DKPro Core type system, the NUM of the ruta type system will
>> be used. If deactivated, the references are resolved against the names
>> in the type system of the given CAS. If you create the CAS with uimaFIT,
>> then there are also types that are not imported in you script. Well, you
>> would not even need to import the types in order to use them in your script.
>>
>>
>> Both options have their advantages and disadvantages. Using strictImport
>> in generic scripts where you initialize type variables using
>> configuration parameters is problematic. If you have a larger pipeline
>> with unknown components with unknown type systems, strictImports is
>> often required. There may be a conflict with other components, which
>> cannot be known when writing the rules.
>>
>>
>> btw, there is also an updated exemplary project using DKPro Core in ruta:
>>
>> https://github.com/pkluegl/ruta/tree/master/ruta-german-novel-with-dkpro
>>
>>
>>
>> Let me know if this helps or if I should provide more information.
>>
>>
>> Best,
>>
>>
>> Peter
>>
>>
>>
>>
>> Am 07.04.2017 um 15:01 schrieb Hugues de Mazancourt:
>>> Hi,
>>>
>>> I’m using Ruta to perform information extraction and I mix it in a pipeline
with DKPro-based resources (for POS-tagging and NER). Thus, I have my own type system, Ruta’s
basic type system and some DKpro typesystems (especially the one describing Tokens)
>>>
>>> I end up with type conflicts such as (Ruta error) :
>>>
>>>> java.lang.IllegalArgumentException: NUM is ambiguous, use one of the following
instead : de.tudarmstadt.ukp.dkpro.core.api.syntax.type.dependency.NUM org.apache.uima.ruta.type.NUM

>>> I tried to use declarations such as :
>>>
>>>> IMPORT org.apache.uima.ruta.type.NUM FROM org.apache.uima.ruta.engine.BasicTypeSystem
AS NUM;
>>> at the top of my Ruta rule files, but this doesn’t help.
>>>
>>> I guess using « org.apache.uima.ruta.type.NUM » instead of « NUM » would
fix the problem, but this wouldn’t increase readability of rules !
>>> The other solution I see would be to create my own, non-ambiguous, readable annotation
and have a rule that marks all org.apache.uima.ruta.type.NUM with that annotation, but I’m
afraid of performance issues due to these redundant annotations.
>>>
>>> Is there any other solution for Ruta to mask some types or alias them ?
>>>
>>> Best,
>>>
>>> Hugues de Mazancourt
>>> http://about.me/mazancourt
>>>
>>>
>>>
>>>
>>>


Mime
View raw message