uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Richard Eckart de Castilho <eckar...@tk.informatik.tu-darmstadt.de>
Subject Re: How to create and use a repository for UIMA annotators?
Date Thu, 03 Mar 2011 09:00:29 GMT
Hi Chris,

> I find the idea of enabling maven by packaging components in jars very 
> compelling. Have you dealt with third-party code that expects to find
> resources from file system locations rather than classpath names?


I am not completely sure what you mean, so I hope my answer will satisfy you.

If you want to build a pipeline for a third-party component, that cannot deal with URLs, DKPro
Core includes ResourceUtils.getUrlAsFile() [1] which bridges that for you and even supports
caching. For resolving "classpath:" URLs, you can either use ResourceUtils.resolveLocation().

   AnalysisEngine ae = createPrimitive(ThirdPartyAE.class,
       ThirdPartyAE.PARAM_RESOURCE_FILE, getUrlAsFile(resolveLocation("classpath:/my/packaged/resource.bin"),
true).getAbsolutePath());

If the third-party component supports the UIMA ResourceLoader, you should be able to configure
that to resolve resources from the file-system.

Some of the components we have implemented support loading resources from the classpath. This
means we can package resources like tagging models as JARs and add them as Maven dependencies
as well. 

DKPro includes Ant scripts that automatically create such JARs for TreeTagger models and binaries
as well as for models of the Stanford Parser and NER. The generated JARs can be uploaded to
a Maven repository and added to a project just like that (due to license restrictions, not
to a public repository). The TreeTagger component is intelligent enough to load the correct
model just by looking at the document language set in the CAS. The Stanford Parser and NER
components currently can't do that, here you'd have to specify a model URL like "classpath:/resource/Classifiers/FaruquiPado/hgc_GERMAN_175M.ser.gz"
(cf. [2]).

DKPro also includes a powerful base class for CollectionReaders that uses the Spring PathMatchingResourcePatternResolver
[3], which is also used by uimaFIT for automatic type detection. ResourceCollectionReaderBase
[4] allows you to easily create CollectionReaders capable of loading data from the file system
or the classpath (or any other location/URL supported by the Spring Resource framework) using
Ant-like inclusion/exclusion patterns. For example our TextReader uses that:

    CollectionReader reader = createCollectionReader(TextReader.class,
        TextReader.PARAM_PATH, "classpath:/data",
        TextReader.PARAM_PATTERNS, new String[] { "[+]text/**/*.txt", "[-]**/broken.txt" },
        TextReader.PARAM_LANGUAGE, "en");

Best,

Richard

[1] http://code.google.com/p/dkpro-core-asl/source/browse/de.tudarmstadt.ukp.dkpro.core-asl/trunk/de.tudarmstadt.ukp.dkpro.core.api.resources/src/main/java/de/tudarmstadt/ukp/dkpro/core/api/resources/ResourceUtils.java
[2] http://code.google.com/p/dkpro-core-gpl/source/browse/de.tudarmstadt.ukp.dkpro.core-gpl/trunk/de.tudarmstadt.ukp.dkpro.core.stanfordnlp/src/test/java/de/tudarmstadt/ukp/dkpro/core/stanfordnlp/StanfordNamedEntityRecognizerTest.java
[3] http://static.springsource.org/spring/docs/2.5.x/api/org/springframework/core/io/support/PathMatchingResourcePatternResolver.html
[4] http://code.google.com/p/dkpro-core-asl/source/browse/de.tudarmstadt.ukp.dkpro.core-asl/trunk/de.tudarmstadt.ukp.dkpro.core.api.io/src/main/java/de/tudarmstadt/ukp/dkpro/core/api/io/ResourceCollectionReaderBase.java

-- 
------------------------------------------------------------------- 
Richard Eckart de Castilho
Technical Lead
Ubiquitous Knowledge Processing Lab 
FB 20 Computer Science Department      
Technische Universit├Ąt Darmstadt 
Hochschulstr. 10, D-64289 Darmstadt, Germany 
phone +49 (6151) 16-7477, fax -5455, room S2/02/E225
eckartde@tk.informatik.tu-darmstadt.de 
www.ukp.tu-darmstadt.de 
------------------------------------------------------------------- 






Mime
View raw message