manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Karl Wright (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CONNECTORS-1034) Manifold 1.7
Date Wed, 17 Sep 2014 12:49:33 GMT

    [ https://issues.apache.org/jira/browse/CONNECTORS-1034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14137164#comment-14137164
] 

Karl Wright commented on CONNECTORS-1034:
-----------------------------------------

Hi Edgardo,

First, since this is the same issue as CONNECTORS-956, and CONNECTORS-956 is still open, please
let's close this issue and discuss your problem in that ticket.

Second, the issue is that SolrJ (and, apparently, Solr as well, to some extent) simply does
not support field names which have characters not that are outside a very specific set.  Until
Solr changes this behavior, we cannot fix it.  Even if you managed to send a field that included
an illegal character to SolrJ and therefore to Solr, there's no guarantee that that would
work.  URL encoding is not ideal for this purpose, so if you could look up the list of disallowed
field name characters, we could try to be more specific about which characters we encode and
which we don't.

Third, the behavior of SolrJ with regard to this issue is very broken.  SolrJ originally did
not do anything to insure that legal XML was generated for field names, because they assumed
that nobody would be using field names that contained illegal characters.  So, no encoding
at all will almost certainly lead to badly formed XML for many or even most documents, unless
SolrJ has been changed to address this issue.  (I opened a SOLR ticket for this problem, but
the Solr team declined to fix it for many releases, and since then I've lost track.)

Fourth, now we have backwards compatibility issues, because people have named their solr fields
based on ManifoldCF's workaround behavior to the above problems.  Your suggestion of a UI
switch would address ONLY this last issue.

SO, given all that, let's continue the discussion in the CONNECTORS-956 ticket, and I'll close
this one.



> Manifold 1.7
> ------------
>
>                 Key: CONNECTORS-1034
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-1034
>             Project: ManifoldCF
>          Issue Type: Bug
>          Components: Solr-4.x-component
>    Affects Versions: ManifoldCF 1.7
>            Reporter: Edgardo Ambrosi
>              Labels: patch
>
> Following the issue CONNECTORS-956, since the behavior makes ManifoldCF unuseful for
Alfresco-Solr-based environment , because it is impossible to correctly populate Solr, could
you provide at least a solution as 
> a checkbox in the "job specification" JSP  page, tab "Solr Field Mapping" near "Keep
All Metadata" to choose preEncode() or not.
> Our Use Case is: 
> Alfresco Server 4.2 enterprise, ManifoldCF, Solr server 4.7.1.
> Set a repo connection type CMIS, 
> Set a output connection type Solr, 
> Set a job with cmis query as "select * from cmis:document" (the repo has only 1 document),
> Running the jobs it normally end but...
> querying Solr the result set reports a strange encoding of the field name:
> if in Alfresco the fileds is named: cmis:name
> then in Solr after ManifoldCF has populated it the index contains the encoded field as
cmis_3Aname
> Best



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message