uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Richard Eckart de Castilho <...@apache.org>
Subject Re: Example of Tika FilesystemReader working with uimaFIT?
Date Thu, 01 Dec 2016 01:09:47 GMT
Hi,

you can set up a "types.txt" file as documented here [1] to
point uimaFIT to the type system descriptor that contains the missing
annotation type.

Alternatively, you can construct a load your type system description
in code and pass it after the class argument to createCollectionReader,
e.g. 

  TypeSystemDescription tsd = TypeSystemDescriptionFactory.createTypeSystemDescriptionFromPath(
    "path/to/your/typesystem.xml");
  CollectionReader readerEngine = CollectionReaderFactory.createCollectionReader(
    FileSystemCollectionReader.class, tsd, ... params ...);

Cheers,

-- Richard

[1] https://uima.apache.org/d/uimafit-current/tools.uimafit.book.html#ugr.tools.uimafit.typesystem

> On 26.11.2016, at 22:26, Paul Browne <paulb@firstpartners.net> wrote:
> 
> ​Folks,
> 
> Wondering if there are any samples of using the Uima component Tika
> FilesystemReader working with uimaFIT?
> 
> I've been playing around with it, getting several errors (probably my
> fault) but can't appear to find a similar example on the website / mailing
> list despite a  search. Have downloaded and compiled source (Uima, Uima
> tools, examples); existing code is clear but when I try to combine them to
> do the following outline I get errors.
> 
> Aim is to:
> 1)Read a collection of documents using the Uima component Tika
> FilesystemReader
> 2)later - do more serious POS tagging.
> 
> The code for is:
> 
>    CollectionReader readerEngine =
> CollectionReaderFactory.createCollectionReader(FileSystemCollectionReader.class,
>                FileSystemCollectionReader.PARAM_INPUTDIR,
>                "C:\\Somelocation",
>                FileSystemCollectionReader.PARAM_ENCODING, "UTF-8",
>                FileSystemCollectionReader.PARAM_LANGUAGE, "EN");
> 
> AggregateBuilder builder = new AggregateBuilder();
> 
> SimplePipeline.runPipeline(readerEngine, builder.createAggregate());
> 
> And the error is
> Exception in thread "main" org.apache.uima.cas.CASRuntimeException: JCas
> type "org.apache.uima.examples.SourceDocumentInformation" used in Java
> code,  but was not declared in the XML type descriptor.
> 
> Similar error referenced at link below, but not clear how to implement the
> suggested fix
> http://user.uima.apache.narkive.com/b940cOrO/how-to-test-a-collectionreader
> 
> Any suggestions or pointers on the web that I should be looking at?
> 
> Thanks for your help
> 
> Paul


Mime
View raw message