uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Kl├╝gl <pklu...@uni-wuerzburg.de>
Subject Re: UIMA Ruta into jar?
Date Fri, 24 Oct 2014 10:21:22 GMT
Hi,

just to summarize possible pitfalls when using a ruta project developed
in the workbench in a normal UIMA/Java environment:

There are two parts:

1. Contains the CAS everything needed?
The CAS needs to contain all types. If the CAS is created using the
analysis engine (descriptor) generated by the workbench and it is still
located in the ruta workbench, then everything should work just nicely.
If the CAS was created using the generated type system descriptor, then
the ruta type priorities need to be included. If the descriptors were
copied to the java project, one has to take care that relative paths are
still valid. The workbench normally uses import by location with
relative paths. There should be no problems when the ruta engine is
included in a larger aggregated analysis engine. If the CAS is created
with uimaFIT by automatically collecting the type systems, then one has
to take care that the types systems of the script files are included and
that the type priorities are not missed. If the type priorities become
too annoying, we could maybe remove them completely in future.

2. Is Ruta able to find all resources?
The layout of ruta projects and the usage of absolute paths in the
descriptors have historical reasons. The problem is that if a java
project includes a ruta project in its classpath, then the ruta engine
is not able to find imported resources. The reason for this is because
the folders script/descriptor/resources are not part of the classpath
but only the root of the ruta project.  Hence, if the absolute paths are
not valid anymore, e.g., because the resources have been copied or
packed into a jar, then the engine tries to find the resources on the
classpath. If, however, the folder structure was copied, then the
imports are not valid anymore, e.g, the engine searches for
"uima.ruta.example.X", but it's located in "descriptor/...". What we do
is to copy the contents of script/descriptor/resources to the root of
the jar. If this jar is included in the classpath of the java project,
then the stuff should be found.

There are already open issues related to these things and we will
improve the handling in future. I also plan to add a section in the
documentation about the pitfalls after the upcoming restructuring. If I
find the time, I will implement the ruta-maven-plugin which should
facilitate the development of ruta script in a maven context.

Best,

Peter


Am 23.10.2014 19:36, schrieb Alexandre Patry:
> On 14-10-23 09:40 AM, Piyush Paliwal wrote:
>> Hi Richard,
>>
>> its seems to work now. Thanks. As I was only at testing stage, I
>> forgot to
>> add other descriptors (OpenNlpTagger, etc) prior to that Ruta
>> descriptor in
>> pipeline. Those were needed so that the CAS can find all types.
>>
>> Though, its a little hectic solution (copy and paste), but is
>> workable and
>> therefore is great.
> I am glad that you made it work! If you want to reduce XML
> boilerplate, you can look at uimaFIT [1], a library offering a very
> nice Java API to replace XML descriptors.
>
> Alexandre
>
> [1] http://uima.apache.org/uimafit.html
>>
>> Piyush
>>
>> On Thu, Oct 23, 2014 at 8:10 AM, Richard Eckart de Castilho
>> <rec@apache.org>
>> wrote:
>>
>>> On 23.10.2014, at 00:39, Piyush Paliwal <piyushpaliwal90@gmail.com>
>>> wrote:
>>>
>>>> As an example, I wish to import the following types from
>>>> TypeSystem.xml
>>>> descriptor which also resides in same folder as script (both files
>>>> now in
>>>> Java project).
>>>>
>>>> //import the additional annotations types and alias in short name
>>>>
>>>> IMPORT de.tudarmstadt.ukp.dkpro.core.api.lexmorph.type.pos.NN FROM
>>>> uima.ruta.example.TypeSystem  AS _NN;
>>>>
>>>> IMPORT de.tudarmstadt.ukp.dkpro.core.api.syntax.type.constituent.PP
>>>> FROM
>>>> uima.ruta.example.TypeSystem AS _PP;
>>> I assume you are invoking Ruta via uimaFIT? If yes, then you should
>>> make
>>> sure that uimaFIT can find all necessary type systems via the type
>>> detection
>>> mechanism [1].
>>>
>>> If you not using uimaFIT or if you have some special way to create your
>>> CASes, make sure that when the CAS is created, all types that all your
>>> scripts need are already loaded at that point.
>>>
>>> UIMA does not allow to change the type system while a pipeline is
>>> running.
>>> Thus the IMPORT declarations will normally not be interpreted when the
>>> script
>>> is executed.
>>>
>>> I do not know how the IMPORT (type) AS (alias) is implemented. If
>>> the alias
>>> is set up at execution time and not at CAS initialization time, it
>>> should
>>> work.
>>>
>>> Alexandre?
>>>
>>> Cheers,
>>>
>>> -- Richard
>>>
>>> [1]
>>> http://uima.apache.org/d/uimafit-current/tools.uimafit.book.html#d5e531
>>>
>>
>>
>



Mime
View raw message