manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ahmet Arslan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CONNECTORS-491) Solr Connector does not treat metadata fields with semicolons in them properly
Date Fri, 13 Jul 2012 10:32:33 GMT

    [ https://issues.apache.org/jira/browse/CONNECTORS-491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13413641#comment-13413641
] 

Ahmet Arslan commented on CONNECTORS-491:
-----------------------------------------

Hi Karl, here is the stack trace Salih sent me. It seems that problem is not the ";" character.
Multiple values for author are empty string and 6;#Emek Kilic.

I can see that MFC sets &literal.Author=36;#Emek+Kilic. Could it be that tika extracts
empty string as Author metedata?

bq. fields are generated by Tika or passed in as literals via literal.fieldname=value. Before
Solr4.0 or if literalsOverride=false, then literals will be appended as multi-value to tika
generated field.

{code}

INFO: [collection1] webapp=/solr path=/update/extract params=
{
literal._ModerationComments=
&literal.DocIcon=pdf
&literal.PublishingExpirationDate=
&literal._ModerationStatus=3
&literal.FolderChildCount=0
&literal.ItemChildCount=0
&literal.ParentVersionString=
&literal._CopySource=
&literal.FileSizeDisplay=4136180
&literal._CheckinComment=
&literal.Edit=0
&literal.id=http://iknow:2525/Documents/Vekaletname.pdf
&literal.LinkFilenameNoMenu=Vekaletname.pdf
&literal.Created=2012-04-04+10:07:00
&literal._UIVersionString=0.1
&literal.Title=ornek1&literal.Modified=2012-04-04+10:07:16
&literal.FileLeafRef=Vekaletname.pdf
&literal.Author=36;#Emek+Kilic
&literal.LinkFilename=Vekaletname.pdf
&literal.lcf_metadata_id=99
&literal.ParentLeafName=
&literal.Editor=36;#Emek+Kilic
&literal.CheckoutUser=
&literal.PublishingStartDate=
&literal.ContentType=Document
} {} 0 120
Jul 12, 2012 2:16:40 PM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: ERROR: [doc=http://iknow:2525/Documents/Vekaletname.pdf]
multiple values encountered for non multiValued field author: [, 36;#Emek Kilic]
{code}
                
> Solr Connector does not treat metadata fields with semicolons in them properly
> ------------------------------------------------------------------------------
>
>                 Key: CONNECTORS-491
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-491
>             Project: ManifoldCF
>          Issue Type: Bug
>          Components: Lucene/SOLR connector
>    Affects Versions: ManifoldCF 0.5.1
>            Reporter: Karl Wright
>            Assignee: Karl Wright
>             Fix For: ManifoldCF 0.7
>
>
> The Solr connector does not escape metadata values that have ";" characters in them,
and Solr thus tries to treat these as multi-valued fields.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message