uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Roland Cornelissen <rol...@kabelco.nl>
Subject Re: Status Tika Annotator
Date Tue, 16 Feb 2010 18:39:48 GMT
Hi,

> version of the code ,
I use 
UIMA 2.30 
Tika Annotator from the UIMA-Annotator-addons package (2.3.0)
Tika 0.6
> and send us the the full stack trace.  
java.lang.NullPointerException
	at org.apache.uima.cas.impl.CASImpl.createFS(CASImpl.java:474)
	at org.apache.uima.tika.MarkupHandler.populateCAS(MarkupHandler.java:165)
	at 
org.apache.uima.tika.TIKAWrapper.populateCASfromURL(TIKAWrapper.java:105)
	at 
org.apache.uima.tika.FileSystemCollectionReader.getNext(FileSystemCollectionReader.java:99)
	at 
org.apache.uima.collection.impl.cpm.engine.ArtifactProducer.readNext(ArtifactProducer.java:494)
	at 
org.apache.uima.collection.impl.cpm.engine.ArtifactProducer.run(ArtifactProducer.java:711)
java.lang.NullPointerException
	at org.apache.uima.cas.impl.CASImpl.createFS(CASImpl.java:474)
	at org.apache.uima.tika.MarkupHandler.populateCAS(MarkupHandler.java:165)
	at 
org.apache.uima.tika.TIKAWrapper.populateCASfromURL(TIKAWrapper.java:105)
	at 
org.apache.uima.tika.FileSystemCollectionReader.getNext(FileSystemCollectionReader.java:99)
	at 
org.apache.uima.collection.impl.cpm.engine.ArtifactProducer.readNext(ArtifactProducer.java:494)
	at 
org.apache.uima.collection.impl.cpm.engine.ArtifactProducer.run(ArtifactProducer.java:711)
uima.tcas.DocumentAnnotation   [Dolphin] AdditionalInfo=31 SortOrder=1 
Timestamp=2010,2,16,13,42,48 ViewMode=1   
org.apache.uima.tika.MarkupAnnotation   [Dolphin] AdditionalInfo=31 
SortOrder=1 Timestamp=2010,2,16,13,42,48 ViewMode=1  
org.apache.uima.tika.MarkupAnnotation  
org.apache.uima.tika.SourceDocumentAnnotation 
org.apache.uima.tika.MarkupAnnotation 
org.apache.uima.tika.MarkupAnnotation [Dolphin] AdditionalInfo=31 
SortOrder=1 Timestamp=2010,2,16,13,42,48 ViewMode=1 


I have  simple testsetup where output is writen to an annotation writer: in 
this case Tika reads 3 html pages, errors on the first 2 and passes the 
annotations from the last (?). Last lines of the stack trace are the printed 
annotations.

I hope this is better info.

Roland




Mime
View raw message