manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Karl Wright (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CONNECTORS-956) Field names are URL encoded
Date Wed, 17 Sep 2014 12:58:33 GMT

    [ https://issues.apache.org/jira/browse/CONNECTORS-956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14137177#comment-14137177
] 

Karl Wright commented on CONNECTORS-956:
----------------------------------------

Another person has opened a ticket that is a duplicate of this one.

The reason that this ticket has not been fixed is because there are still problems with SolrJ
generating illegal XML when arbitrary characters are used as field names.  So, SOME encoding
is essential, in order for fieldnames to be transmitted to Solr correctly.  The Solr/Lucene
team also tightly restricts the characters that can be used in fields fairly drastically,
so even if this problem is fixed in MCF, there's a good chance you still won't be able to
use whatever funky field name your repository connector comes up with in Solr itself.

Given all that, I still believe that URL encoding is probably too restrictive, in that some
characters which are legal field names wind up getting encoded, so we can try to introduce
an option for a different encoding.  But this is not likely to satisfy everybody regardless,
since the problem is fundamentally a Solr restriction.


> Field names are URL encoded
> ---------------------------
>
>                 Key: CONNECTORS-956
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-956
>             Project: ManifoldCF
>          Issue Type: Improvement
>          Components: Lucene/SOLR connector
>    Affects Versions: ManifoldCF 1.6.1
>            Reporter: Piergiorgio Lucidi
>            Assignee: Karl Wright
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> The field names provided by some repositories such as Alfresco are based on an URI similar
to:
> {code}
> {http://www.alfresco.org/model/system}store_identifier
> {code}
> But in Solr we found the following field name:
> {code}
> http_3a_2f_2fwww_alfresco_org_2fmodel_2fsystem_2f1_0_7dstore_identifier
> {code}
> The code involved in the Solr connector is the following:
> {code}
> protected static String preEncode(String fieldName)
>   {
>       return URLEncoder.encode(fieldName);
>   }
> {code}
> Probably we should try to solve it removing the preEncode invocation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message