uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bonnie MacKellar <bkmackel...@gmail.com>
Subject Re: best practice for building RUTA scripts in Eclipse when they are to be run in Java?
Date Mon, 18 Jan 2016 21:54:38 GMT
Hi,

Thanks for all this info. I still have to read it through carefully. One
thing though - I had already tried the Limit checkbox under the jCasGen
button, but it doesn't seem to work - the ruta types still get generated. I
also set it as a preference (Window->Preferences->UIMA Preferences, clicked
the checkbox that says Limit JCasGen to types defined in this project) but
that didn't work either.

I will check on the RutaDescriptorFactory. So far, what I am doing works,
but my rules are not yet very complex.

thanks,
Bonnie MacKellar

On Mon, Jan 18, 2016 at 10:09 AM, Peter Klügl <peter.kluegl@averbis.com>
wrote:

> Hi,
>
> comments inline...
>
> Some background information at first:
> The UIMA Ruta Workbench and its project layout was initially designed to
> support the user in developing rules. Thus, the project structure is "as
> simple as possible" and tailored for this task. However, the deployment
> and integration of the rules in actual applications was neglected since
> there is no direct support for generic build tools, e.g, for packaging
> the rules and descriptiors correctly in a jar. This leads often to
> situations where users need to perform manual steps which is a no go for
> larger projects.
>
> In my opinion, maven has more advantages than disadvantages and is a
> must for development of larger java projects. Thus, I created a maven
> plugin that does that stuff, the Workbench was responsible for: it
> creates the descriptors and compiles the wordlists. I also adapted the
> Workbench to work with normal maven based projects. So now, at least
> when I write serious ruta rules I do not use the old project layout in
> the Workbench, but a common maven project with additional plugins. There
> you can specify the packaging of your ruta based application or just see
> the ruta files as some resources in a java file, and you can add
> automatic unit tests. It's much more flexible and powerful and I think
> it's worth the trouble. Sometimes I also use the old project layout, but
> just for some quick testing.
>
> I cannot provide a general introduction to maven. Best to start by using
> an existing project and learn by modifying and adapting it to your use
> case.
>
> Am 14.01.2016 um 18:45 schrieb Bonnie MacKellar:
> > Hi,
> >
> > Sorry that the images did not go through! I had hoped to avoid typing
> >
> > Here it is
> > The top level
> >
> > ECClassifier
> >   - bin
> >   --desc
> >   -descriptor
> >       -utils
> >       -BasicEngine.xml
> >       -BasicTypeSystem.xml
> >       -InternalTypeSystem.xml
> >       -testrulesEngine.xml
> >       -testrulesTypeSystem.xml
> >    -input
> >    -metadata
> >    -output
> >    -script
> >          -testrules.ruta
> >     -src
> >        -desc
> >           -types
> >               -BasciMetaMapTypeSystem.xml
> >               -BasicTypeSystem.xml
> >               -InternalTypeSystem.xml
> >               -MetaMapApiAE.xml
> >               -MetaMapApiTypeSystem.xml
> >               -testrulesTypeSystem.xml
> >         -ec
> >            -metamap
> >               -PipelineSystem.java (this is the main application)
> >         -gov  (all of the Metamap types were copied here for some reason)
> >         -META-INF
> >             -org.apache.uima.fit
> >                  -types.txt
> >        -ruta
> >          -ec
> >              -testrules.ruta  (copied from above)
> >         -testrules (these are the types generated by jCasGen, which come
> > from my Ruta script)
> >             -Relational_Type.java
> >             -Relational.java
> >             -UMLSConcept_type.java
> >             -UMLSConcept.java
> >
> > Unfortunately I am rather new to all of this, so I am not totally
> following
> > some of your answers. I put my comments and questions inline. The
> > description of integrating Ruta into Java in the tutorial doesn't say
> much
> > about project layout in Eclipse so I had to do lots of internet searches
> > and copy what I found, so I am sure I am not doing things in the best
> way.
>
> What did you miss? I should link/add more examples. I will add more
> documentation when I find the time.
>
>
> > On Thu, Jan 14, 2016 at 11:19 AM, Peter Klügl <peter.kluegl@averbis.com>
> > wrote:
> >
> >> Hi,
> >>
> >> just a few short first comments... more tomorrow...
> >>
> >> - Unfortunately, the images did not make it (due to the mailing list
> >> settings?). You can send me the mail directly if you want.
> >> - I really prefer now to develop ruta script in maven built projects. Is
> >> maven an option for you?
> >>
> > I don't know Maven very well and really did not want to add another layer
> > of complexity to this already very complex system. How does Maven help?
>
> It automates the manual steps like generating the JCas classes and
> copying the descriptors. There is a maven plugin for JCasGen and one for
> ruta. Thus the build process results in exactly the jar you want. It's
> really terrible in the beginning, but I would not want to miss it now.
>
> >
> >> - You can limit JCasGen to the current project. Then, only local type
> >> systems are used to generate the classes and the problem with overriding
> >> RutaBasic is avoided. However, if you copy the descriptors, that does
> >> not help.
> >>
> > I copied the descriptors, and then used jCasGen on the descriptor down in
> > the src folder. How do I limit JCasGen?
> >
>
> There is a checkbox right below the JCasGen button in hte component
> descriptor editor. The maven plugin also support this option. If you
> want to generate JCas classes from ruta scripts, then you have to take
> care that the types of BasicTypeSystem are not generated as well. The
> only way I see right now to avoid that is this option but it only works
> if the type system is located in a different project. Thus, no copying
> and no old project layout (or some hacks). With the maven plugin, the
> descriptors can be loaded from, e.g., the classpath of the project, and
> do not need to be located within the project.
>
> >> - JCasGen on generated type systems of ruta scripts can be tricky
> >> (because ruta imports the BasicTypeSystem by default and this one should
> >> not be generated anew). I rather recommned to define JCas cover class
> >> type in separate type systems.
> >>
> > Not sure I follow what this means
> >
>
> I normally have additional typesystem descriptors in my ruta projects
> which contain the types that are used to generate the java classes. The
> types defined in the ruta scripts are not used for generating the JCas
> classes but only for intermediate annotations. This is probably only a
> personal preference.
>
>
> >> - Copying descriptors should be avoided in general
> >>
> > So how do I develop using the workbench but get the results into Java?
> ALl
> > of the posts I read stressed that you have to have the typedescriptors
> and
> > script under the src folder, but the workbench doesn't want them there.
> The
> > only online example I could find that uses both MetaMap and Ruta had a
> > layout kind of similar to mine.
>
> The maven plugin is able to generate the buildpath that is required by
> the Workbench to work correctly. It contains the ruta specific source
> paths. So the Workbench can also be used in normal java projects if they
> are built with maven. Then, everything is automatically usable in Java.
>
> >
> >> - Do you need the descriptors of ruta at all? Did you define new types
> >> in ruta scripts? The java code does not make use of the ruta descriptors
> >>
> > Yes, I define new types, and eventually there will be lots of types. The
> > current Ruta script is a test case only. I am not sure what you mean when
> > you say the java code does not make use of the descriptors. If I don't
> have
> > them, I get lots of runtime errors.
> >
>
> Hmmm... missed the types.txt which is responsible that the types are
> available.
>
> Your code lines
> AnalysisEngine rae =
> AnalysisEngineFactory.createEngine(RutaEngine.class,
> RutaEngine.PARAM_MAIN_SCRIPT,
>            "testrules");
> AnalysisEngineDescription rutaEngineDesc =
> AnalysisEngineFactory.createEngineDescription(RutaEngine.class,
> RutaEngine.PARAM_MAIN_SCRIPT, "testrules");
>
> use the RutaEngine implementation in order to create the analysis engine
> description. This description do not refer the types defined in the ruta
> scripts. This works for simple script project but will fail for more
> complicated ones, e.g., when you have several scripts importing each
> other. There is a helper class in ruta for generating descriptors from
> scripts including types:
> org.apache.uima.ruta.descriptor.RutaDescriptorFactory
>
>
> >> - The way you create the ruta descriptors in the java example does not
> >> support all ruta functionality, e.g. , new types
> >>
> > I am probably doing it completely wrong :-). I couldn't find many
> examples.
>
> No, a lot of functionality was added but not as much documentation. Let
> me know if you want to have more information about the
> RutaDescriptorFactory. The maven plugin uses this class the generate the
> descriptors.
>
> I hope this helps a bit. Just ask if you have more questions or if
> something is not clear.
>
> Best,
>
> Peter
>
>
> >
> >> - The duplicate import is fixed in the next release
> >>
> > OK, good to know that this is not something I am doing wrong.
> >
> >
> >> - Is the code open source somewhere, e.g., on github?
> >>
> > No, beause it is test code only right now.
> >
> >> Best,
> >>
> >> Peter
> >>
> >> Am 14.01.2016 um 16:13 schrieb Bonnie MacKellar:
> >>> Hi,
> >>>
> >>> I just spent the last 4 days stumbling through the documentation,
> >>> tutorials, posts to this mailing list, and any code examples I could
> >>> find on the Internet, so I could integrate the Metamap annotator and a
> >>> RUTA script in Java using UimaFit. I succeeded, and I have something
> >>> that runs, but I doubt I am organizing things the best way in Eclipse,
> >>> and in particular, I am noticing some odd things if I try to build and
> >>> test the script first in the Ruta development environment in Eclipse
> >>> and then move the script to my Java environment. I suspect my workflow
> >>> is not the best possible, so I am looking for advice on how to manage
> >>> this.
> >>>
> >>> My project was created as a Ruta project so I could have the
> >>> development environment support. I then added Uima nature to the
> >>> project to get the Java development folders. I set up the type
> >>> descriptors for Metamap, and after much reading, realized I needed a
> >>> types.txt file in my source folder that tells the system how to find
> >>> the Metamap type descriptors. I then added the Ruta script to the
> >>> pipeline in my Java class and then copied the type descriptor for that
> >>> down to my source folders as well. Finally, I realized I needed java
> >>> classes for the types, and that pressing a jCasGen button in the
> >>> ComponentDescriptorEditor was the way to do that. However, there are
> >>> some anomalies when I do this.
> >>>
> >>> So, my project has this structure at the top level
> >>>
> >>> Inline image 1
> >>>
> >>> and at the src level, this is the structure. Notice that the Ruta
> >>> script and types have been copied down to this level
> >>>
> >>> Inline image 2
> >>>
> >>>
> >>> The code that creates the AnalysisEngineDescriptors and runs the
> >>> pipeline looks like this (it is in PipelineSystem. java)
> >>>
> >>> try {
> >>> ae =
> >>>
> >>
> AnalysisEngineFactory.createEngine(gov.nih.nlm.nls.metamap.uima.MetaMapAnnotator.class);
> >>> AnalysisEngineDescription mmEngineDesc =
> >>>
> >>
> AnalysisEngineFactory.createEngineDescription(gov.nih.nlm.nls.metamap.uima.MetaMapAnnotator.class);
> >>> AnalysisEngine rae =
> >>> AnalysisEngineFactory.createEngine(RutaEngine.class,
> >>> RutaEngine.PARAM_MAIN_SCRIPT,
> >>>            "testrules");
> >>> AnalysisEngineDescription rutaEngineDesc =
> >>> AnalysisEngineFactory.createEngineDescription(RutaEngine.class,
> >>> RutaEngine.PARAM_MAIN_SCRIPT,
> >>>            "testrules");
> >>> JCas jCas = ae.newJCas();
> >>> jCas.setDocumentText("serum albumin greater or equal 2g/dL");
> >>> SimplePipeline.runPipeline(jCas, mmEngineDesc, rutaEngineDesc);
> >>> displayResults(jCas);
> >>> displayRutaResults(jCas);
> >>>
> >>> and the types.txt file contains this
> >>> classpath*:desc/types/MetaMapApiTypeSystem.xml
> >>> classpath*:desc/types/BasicTypeSystem.xml
> >>> classpath*:desc/types/InternalTypeSystem.xml
> >>> classpath*:desc/types/testrulesTypeSystem.xml
> >>>
> >>>
> >>> If I want to use the Ruta Workbench to develop my Ruta script, it
> >>> appears that I have to regenerate the java type files, such as
> >>> Relational.java, each time I make a change. Is that correct?
> >>> And when I do this, I notice that it completely regenerates the
> >>> org.apache.uima.ruta.type hierarchy, which leads to an odd runtime
> >>> error  (NoSuchMethodException, caused by trying to call
> >>> setLowMemoryProfile). I read a chain on this list about this error
> >>> which recommended to delete the regenerated uima type hierachy. This
> >>> worked, but it seems I have to go through these steps every time I
> >>> regenerate the Ruta types, which is a pain.
> >>>
> >>> Also, I notice that the metamap type hierarchy is also regenerated
> >>> inside my project. I theorize it is because of the import in my Ruta
> >>> type descriptor
> >>> TYPESYSTEM BasicTypeSystem;
> >>> TYPESYSTEM BasicMetaMapTypeSystem;
> >>> TYPESYSTEM MetaMapApiTypeSystem;
> >>> DECLARE Relational,UMLSConcept;
> >>> Candidate{ -> MARK(UMLSConcept)};
> >>>
> >>> is this not the right way to make my script aware of the Metamap types?
> >>>
> >>> I also notice that in the type descriptor, this import is generated
> twice
> >>> <imports>
> >>>         <import location="BasicTypeSystem.xml"/>
> >>>         <import location="BasicTypeSystem.xml"/>
> >>>     </imports>
> >>>
> >>> In general, is it a good or bad idea to develop the Ruta script in the
> >>> workbench and then copy its pieces into the Java source folder? It
> >>> seems like a very convoluted process.
> >>>
> >>> Thanks for your help
> >>>
> >>> Bonnie MacKellar
> >>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message