manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <>
Subject Re: Solr Output Connector - Questions
Date Thu, 05 Jun 2014 13:26:49 GMT
Ah, it seems I remember it backwards.

SolrJ did *not* url-encode field names; that was the bug.  Instead, it
tried to send field names in unencoded form, which would mess up URLs and
cause problems.  See this method in

  /** Preprocess field name.
  * SolrJ has a bug where it does not URL-escape field names.  This causes
carnage for
  * ManifoldCF, because it results in IllegalArgumentExceptions getting
thrown deep in SolrJ.
  * See CONNECTORS-630.
  * In order to get around this, we need to URL-encode argument names, at
least until the underlying
  * SolrJ issue is fixed.
  protected static String preEncode(String fieldName)
      return URLEncoder.encode(fieldName);

It sounds like the SolrJ issue may have been fixed.  Can you try, or have
the customer try, changing this method to just "return fieldName;"?  Make
sure that you can still ingest documents that have funky field names that
include international characters and punctuation; these come from some of
our connectors.


On Thu, Jun 5, 2014 at 9:20 AM, Karl Wright <> wrote:

> Hi Piergiorgio,
> I had a back-and-forth with Eric Hatcher about this issue about a year
> ago.  Solr technically accepts only a limited set of characters, and SolrJ
> is therefore not well coded to deal with anything much out of the
> ordinary.  I tried to get them to consider removing the url encoding, but
> they said no for that reason: "you are doing things which you shouldn't be
> doing anyway".
> We did work around this problem for field *values*.  I'll review what was
> done to see if can be applied to field *names* too.  In the meantime,
> please open a ticket to track the issue.
> Thanks,
> Karl
> On Thu, Jun 5, 2014 at 9:13 AM, Piergiorgio Lucidi <
> > wrote:
>> Hi guys,
>> I'm wondering if it is possible to disable the URL encoding on fields on
>> this connector.
>> Trying to create indexes from a repository that have the name of the field
>> with a URI similar to the following:
>> {}store_identifier
>> But the field is stored in Solr in this way:
>> http_3a_2f_2fwww_alfresco_org_2fmodel_2fsystem_2f1_0_7dstore_identifier
>> That is bad, our customer needs to create a copyField to solve this issue.
>> Is there a reason why we continue to use the URL encoding?
>> I saw in the code that it seems to be an issue related to SolrJ, but do
>> you
>> think that we can find a workaround for this?
>> Thank you.
>> Cheers,
>> Piergiorgio
>> --
>> Piergiorgio Lucidi
>> Open Source ECM Specialist

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message