uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marshall Schor <...@schor.com>
Subject Re: Not seeing the document names in the Document Analyzer
Date Wed, 23 Mar 2011 14:09:59 GMT
Now posted as a Jira: https://issues.apache.org/jira/browse/UIMA-2097

-Marshall

On 3/22/2011 1:54 PM, Marshall Schor wrote:
> OK - found the problem.
>
> The Document analyzer uses a component "FileSystemCollectionReader" to read the
> files. This component inserts into the CAS the name of the file being read,
> using the code:
>
>       // Also store location of source document in CAS. This information is critical
>       // if CAS Consumers will need to know where the original document contents
> are located.
>       // For example, the Semantic Search CAS Indexer writes this information
> into the
>       // search index that it creates, which allows applications that use the
> search index to
>       // locate the documents that satisfy their semantic queries.
>       SourceDocumentInformation srcDocInfo = new SourceDocumentInformation(jcas);
>       srcDocInfo.setUri(file.getAbsoluteFile().toURL().toString());
>
> This last line gets the source file name, in your case
>
> C:\Watson\UIMA sdk\apache-uima\examples\data
>
> and the toURL converts the "blank" to "%20"
>
> which then causes the serialization code to fail when it attempts to create the file
name, and as a result, the default file name is used.
>
> I could reproduce this by making the source directory have a blank in it.
>
> You can avoid this issue by having the source directory the document analyzer is using,
be one without blanks in the path name.
>
> Cheers. -Marshall
>
>
> On 3/22/2011 1:09 PM, Marshall Schor wrote:
>> On 3/22/2011 12:25 PM, Marshall Schor wrote:
>>> Here's an idea:
>>>
>>> The suffix doc1.xmi doc2.xmi, etc are produced when the XMI Cas Serializer is
>>> called with a null file name:
>>>
>>> uimaj-examples/src/main/java/org/apache/uima/examples/xmi/XmiWriterCasConsumer.java
>>>
>>> line 108-110:
>>>     if (outFile == null) {
>>>       outFile = new File(mOutputDir, "doc" + mDocNum++ + ".xmi");    
>>>     }
>>>
>>> The code above that has a try block that might be getting tripped up by the fact
>>> that your install point is in a path with a blank in it.
>>>
>>> Can you try installing into a path without a blank?
>> I tried this, and it also worked (with blanks in the file path) - so that's not
>> it...
>>
>> I'll contact you off-list to debug this mystery. -Marshall
>>> -Marshall
>>>
>>> On 3/22/2011 8:48 AM, Bob Sizemore wrote:
>>>> Anybody have any ideas for me to try to get the doc analyzer showing the
right
>>>> document names?
>>>>
>>>>
>>>>
>

Mime
View raw message