uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Roeder <chris.roe...@ucdenver.edu>
Subject Re: How to create and use a repository for UIMA annotators?
Date Wed, 02 Mar 2011 16:13:56 GMT

I find the idea of enabling maven by packaging components in jars very 
compelling. Have you dealt with third-party code that expects to find
resources from file system locations rather than classpath names?


On 3/2/11 12:49 AM, Richard Eckart de Castilho wrote:
> Hello Greg,
>> It's sort of a "maven-like" model (i.e. when using a Nexus server).  Or maybe I should
just actually use maven and nexus?
>> Has anyone out there tried to create a "UIMA Repository" that can be directly referenced
from a component descriptor file?  How did you make it work?
> We consider ourselves to have a "UIMA Repository" based on Maven - cf. DKPro Core http://code.google.com/p/dkpro-core-asl/
> I would like to point out that we have largely abandonded static UIMA descriptors (except
type descriptors).
> We feel very comfortable programming on the Java level, dynamically creating descriptors
using uimaFIT and running our pipelines directly from within Java (no CPE GUI or such).
> For this scenario, Maven works like a charm for us. We do not even worry too much about
type systems, because we have packaged their XML descriptors and JCas
> wrappers in JARs as well and can simply add them as Maven dependencies. We use uimaFIT's
automatic type system detection feature to dynamically construct a
> global type system description from all type system description files that could be found
in a well-defined location in the classpath (that is, in the afore
> mentioned JARs). A short example:
>    * add dependency on de.tudarmstadt.ukp.dkpro.core.io.text-asl (for TextReader)
>    * add dependency on de.tudarmstadt.ukp.dkpro.core.tokit-asl (for BreakIteratorSegmenter)
>    * add dependency on de.tudarmstadt.ukp.dkpro.core.dictionaryannotator-asl (for DictionaryAnnotator)
>    * dependency on uimaFIT automatically added (for CASDumpWriter)
>    * dependencies on type systems and JCas wrappers automatically added by Maven
> Then we can immediately assemble and run a pipeline:
>      CollectionReader reader = createCollectionReader(TextReader.class,
>          TextReader.PARAM_PATH, "src/test/resources/text",
>          TextReader.PARAM_PATTERNS, new String[] { "[+]*.txt", "[-]broken.txt" },
>          TextReader.PARAM_LANGUAGE, "en");
>      AnalysisEngine tokenizer = createPrimitive(BreakIteratorSegmenter.class);
>      AnalysisEngine nameFinder = createPrimitive(DictionaryAnnotator.class,
>          DictionaryAnnotator.PARAM_PHRASE_FILE, "src/test/resources/dictionaries/names.txt",
>          DictionaryAnnotator.PARAM_ANNOTATION_TYPE, Name.class.getName());
>      AnalysisEngine writer = createPrimitive(CASDumpWriter.class,
>          CASDumpWriter.PARAM_OUTPUT_FILE, "target/output.txt");
>      SimplePipeline.runPipeline(reader, tokenizer, nameFinder, writer);
> Notice that no line references a type system whatsoever. This is because we let uimaFIT
automatically scan the classpath and simply make all
> types it finds available to every created component.
> Our approach seems to work great for our researchers to assemble and run pipelines on
a single machine. We do currently not scale out UIMA.
> Cheers,
> Richard

Christophe (Chris) Roeder
Software Developer, Professional Research Assistant
Center for Computational Pharmacology, University of Colorado Denver
12801 E 17th Ave, MS 8303,  Aurora, CO 80045 USA
chris.roeder@ucdenver.edu / tel: (303) 724-7574

View raw message