manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vinay (JIRA)" <>
Subject [jira] [Commented] (CONNECTORS-1494) Error crawling file system with file names having special characters.
Date Thu, 08 Feb 2018 12:37:00 GMT


Vinay commented on CONNECTORS-1494:

Thanks Karl. Though the above solution partially fixes the issue, we still see that manifold
cf is not picking the files with name like "a_XY-SMnA_ABC_Uuޓࠚϯmӣܼ˵Ҫȳ_֚3ҿؖúشԃԫхրҠë.pdf"
when run from linux machine. No errors on the logs.

If the same file is copied to windows machine and run by manifold cf on windows, the file
is picked up. Any idea why such files are not being picked up when running on linux? With
no error on console, we are unable to figure out.

> Error crawling file system with file names having special characters.
> ---------------------------------------------------------------------
>                 Key: CONNECTORS-1494
>                 URL:
>             Project: ManifoldCF
>          Issue Type: Bug
>          Components: File system connector
>    Affects Versions: ManifoldCF 2.9.1
>            Reporter: Vinay
>            Assignee: Karl Wright
>            Priority: Major
> I am crawling a file system mounted on linux machine. So the Repository Connection is
of type "File System". For some files which has some special characters, Manifold Cf is not
picking such files.
> File ex: a_XY-SMnA_ABC_Uuޓࠚϯmӣܼ˵Ҫȳ_֚3ҿؖúشԃԫхրҠë.pdf
> exception: java.lang.NumberFormatException: For input string: ""
>      at java.lang.NumberFormatException.forInputString(
>      at java.lang.Long.parseLong( ~[?:1.8.0_151]
>      at java.lang.Long.<init>( ~[?:1.8.0_151]
>      at org.apache.manifoldcf.agents.transformation.documentfilter.DocumentFilter$SpecPacker.<init>(
>      at org.apache.manifoldcf.agents.transformation.documentfilter.DocumentFilter.getPipelineDescription(
>      at org.apache.manifoldcf.agents.incrementalingest.IncrementalIngester.getTransformationDescription(
>      at org.apache.manifoldcf.crawler.system.PipelineSpecification.<init>(
>      at
>  FATAL 2018-02-07T23:47:15,927 (Worker thread '2') - Error tossed: For input string:

This message was sent by Atlassian JIRA

View raw message