uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marshall Schor <...@schor.com>
Subject Re: Not seeing the document names in the Document Analyzer
Date Tue, 22 Mar 2011 17:54:27 GMT
OK - found the problem.

The Document analyzer uses a component "FileSystemCollectionReader" to read the
files. This component inserts into the CAS the name of the file being read,
using the code:

      // Also store location of source document in CAS. This information is critical
      // if CAS Consumers will need to know where the original document contents
are located.
      // For example, the Semantic Search CAS Indexer writes this information
into the
      // search index that it creates, which allows applications that use the
search index to
      // locate the documents that satisfy their semantic queries.
      SourceDocumentInformation srcDocInfo = new SourceDocumentInformation(jcas);
      srcDocInfo.setUri(file.getAbsoluteFile().toURL().toString());

This last line gets the source file name, in your case

C:\Watson\UIMA sdk\apache-uima\examples\data

and the toURL converts the "blank" to "%20"

which then causes the serialization code to fail when it attempts to create the file name,
and as a result, the default file name is used.

I could reproduce this by making the source directory have a blank in it.

You can avoid this issue by having the source directory the document analyzer is using, be
one without blanks in the path name.

Cheers. -Marshall


On 3/22/2011 1:09 PM, Marshall Schor wrote:
>
> On 3/22/2011 12:25 PM, Marshall Schor wrote:
>> Here's an idea:
>>
>> The suffix doc1.xmi doc2.xmi, etc are produced when the XMI Cas Serializer is
>> called with a null file name:
>>
>> uimaj-examples/src/main/java/org/apache/uima/examples/xmi/XmiWriterCasConsumer.java
>>
>> line 108-110:
>>     if (outFile == null) {
>>       outFile = new File(mOutputDir, "doc" + mDocNum++ + ".xmi");    
>>     }
>>
>> The code above that has a try block that might be getting tripped up by the fact
>> that your install point is in a path with a blank in it.
>>
>> Can you try installing into a path without a blank?
> I tried this, and it also worked (with blanks in the file path) - so that's not
> it...
>
> I'll contact you off-list to debug this mystery. -Marshall
>> -Marshall
>>
>> On 3/22/2011 8:48 AM, Bob Sizemore wrote:
>>> Anybody have any ideas for me to try to get the doc analyzer showing the right
>>> document names?
>>>
>>>
>>>
>

Mime
View raw message