manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Karl Wright (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CONNECTORS-1434) Bad characters in file name can cause Solr 500 errors
Date Wed, 21 Jun 2017 10:04:00 GMT

    [ https://issues.apache.org/jira/browse/CONNECTORS-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16057282#comment-16057282
] 

Karl Wright commented on CONNECTORS-1434:
-----------------------------------------

It appears that HttpClient does no escaping of the form name or body content of any kind.
 The filename appears as the title of the body content in the multipart area, and it appears
also in the content type of the response surrounded by double quotes.  The file name that
gets passed in would have to be legal in both of those contexts.


> Bad characters in file name can cause Solr 500 errors
> -----------------------------------------------------
>
>                 Key: CONNECTORS-1434
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-1434
>             Project: ManifoldCF
>          Issue Type: Bug
>          Components: Lucene/SOLR connector
>    Affects Versions: ManifoldCF 2.7
>            Reporter: Karl Wright
>            Assignee: Karl Wright
>             Fix For: ManifoldCF 2.8
>
>
> There are reports that quotes or spaces in a file name can blow up the Solr indexing
of the document and cause it to throw a 500 error.
> The code in question (from ModifiedHttpSolrClient) is the following:
> {code}
>             String name = content.getName();
>             if (name == null) {
>               name = "";
>             }
>             parts.add(new FormBodyPart(name,
>                 new InputStreamBody(
>                     content.getStream(),
>                     contentType,
>                     content.getName())));
> {code}
> ... where content.getName() would be returning a name with illegal characters.  The question
is, what does httpclient do with this name, and should it be escaping it in some way?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message