Return-Path: X-Original-To: apmail-manifoldcf-dev-archive@www.apache.org Delivered-To: apmail-manifoldcf-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 85E4711DC5 for ; Thu, 18 Sep 2014 07:17:34 +0000 (UTC) Received: (qmail 23558 invoked by uid 500); 18 Sep 2014 07:17:34 -0000 Delivered-To: apmail-manifoldcf-dev-archive@manifoldcf.apache.org Received: (qmail 23502 invoked by uid 500); 18 Sep 2014 07:17:34 -0000 Mailing-List: contact dev-help@manifoldcf.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@manifoldcf.apache.org Delivered-To: mailing list dev@manifoldcf.apache.org Received: (qmail 23490 invoked by uid 99); 18 Sep 2014 07:17:34 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 18 Sep 2014 07:17:34 +0000 Date: Thu, 18 Sep 2014 07:17:34 +0000 (UTC) From: "Karl Wright (JIRA)" To: dev@manifoldcf.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (CONNECTORS-956) Field names are URL encoded MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CONNECTORS-956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14138630#comment-14138630 ] Karl Wright commented on CONNECTORS-956: ---------------------------------------- I did some research as to what happens in SolrJ right at the moment. The key method is SolrServer.request(ContentStreamUpdateRequest cs), which in the non-Solr-Cloud case we've overridden to fix other bugs to be ModifiedHttpSolrServer (which extends the SolrJ class org.apache.solr.client.solrj.impl.HttpSolrServer). What this does for Get, Post, and multipart Post is as follows: Get: {code} method = new HttpGet( baseUrl + path + ClientUtils.toQueryString( params, false ) ); {code} Post: {code} if (isMultipart) { parts.add(new FormBodyPart(p, new StringBody(v, StandardCharsets.UTF_8))); } else { postParams.add(new BasicNameValuePair(p, v)); } {code} Multipart: {code} post.setEntity(new UrlEncodedFormEntity(postParams, StandardCharsets.UTF_8)); ModifiedMultipartEntity entity = new ModifiedMultipartEntity(HttpMultipartMode.STRICT, null, StandardCharsets.UTF_8); for(FormBodyPart p: parts) { entity.addPart(p); } post.setEntity(entity); {code} Not multipart: {code} post.setEntity(new UrlEncodedFormEntity(postParams, StandardCharsets.UTF_8)); {code} I believe multipart post and post are therefore safe against illegal parameter name characters. However, ClientUtils.toQueryString( params, false ) is NOT safe: {code} public static String toQueryString( SolrParams params, boolean xml ) { StringBuilder sb = new StringBuilder(128); try { String amp = xml ? "&" : "&"; boolean first=true; Iterator names = params.getParameterNamesIterator(); while( names.hasNext() ) { String key = names.next(); String[] valarr = params.getParams( key ); if( valarr == null ) { sb.append( first?"?":amp ); sb.append(key); first=false; } else { for (String val : valarr) { sb.append( first? "?":amp ); sb.append(key); if( val != null ) { sb.append('='); sb.append( URLEncoder.encode( val, "UTF-8" ) ); } first=false; } } } } catch (IOException e) {throw new RuntimeException(e);} // can't happen return sb.toString(); } {code} I can't override that method, because it's a static and multiple places call it. The best I can do is override the solr server classes that make use of it. That may or may not work; the derivation of (say) org.apache.solr.client.solrj.impl.CloudSolrServer is complex. The concern is that we don't control that flow, for the most part, although posts, gets, and multipart posts *do* still go through our ModifledHttpSolrServer class. What I propose to do is to break backwards compatibility in trunk, since it's ManifoldCF 2.0 anyway and that is allowed. If the change seems to work there, we can talk about adding a switch in the dev_1x branch. > Field names are URL encoded > --------------------------- > > Key: CONNECTORS-956 > URL: https://issues.apache.org/jira/browse/CONNECTORS-956 > Project: ManifoldCF > Issue Type: Improvement > Components: Lucene/SOLR connector > Affects Versions: ManifoldCF 1.6.1 > Reporter: Piergiorgio Lucidi > Assignee: Karl Wright > Fix For: ManifoldCF 2.0 > > Original Estimate: 24h > Remaining Estimate: 24h > > The field names provided by some repositories such as Alfresco are based on an URI similar to: > {code} > {http://www.alfresco.org/model/system}store_identifier > {code} > But in Solr we found the following field name: > {code} > http_3a_2f_2fwww_alfresco_org_2fmodel_2fsystem_2f1_0_7dstore_identifier > {code} > The code involved in the Solr connector is the following: > {code} > protected static String preEncode(String fieldName) > { > return URLEncoder.encode(fieldName); > } > {code} > Probably we should try to solve it removing the preEncode invocation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)