lucene-solr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Khalid Yagoubi (JIRA)" <j...@apache.org>
Subject [jira] Commented: (SOLR-1633) Solr Cell should be smarter about literal and multiValued="false"
Date Fri, 11 Dec 2009 21:46:19 GMT

    [ https://issues.apache.org/jira/browse/SOLR-1633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12789552#action_12789552
] 

Khalid Yagoubi commented on SOLR-1633:
--------------------------------------

I have written a patch for Tika Solr Extraction to ignore tika fields it's works but I'm not
sure my patch is the best way.
It's solved my problem by avoiding tika extract metadata that conflict with my own literral
non multivalued field.
Exemple : <meta name="id" content="10"/> is extracted as id or I give my own id : litteral.id
= 12
==> error because id is non multivalued field

I explain here my patch :
- I patched SolrContentHandler.java
- I added a params contentOnly= true|false
- I ignore metadata from Tika that are defined in the schema

Ideas for improvements : 
- Ignore only metadata that are given in literral.foo and is not multivalued
- Prefix these fields
- Find a better name for params contentOnly or ign.meta.conflict

I'll submit my patch tommorow in the night

Thanks for suggestions

> Solr Cell should be smarter about literal and multiValued="false"
> -----------------------------------------------------------------
>
>                 Key: SOLR-1633
>                 URL: https://issues.apache.org/jira/browse/SOLR-1633
>             Project: Solr
>          Issue Type: Improvement
>          Components: contrib - Solr Cell (Tika extraction)
>            Reporter: Hoss Man
>
> As noted on solr-user, SolrCell has less then ideal behavior when "foo" is a single value
field, and literal.foo=bar is specified in the request, but Tika also produces a value for
the "foo" field from the document.  It seems like a possible improvement here would be for
SolrCell to ignore the value from Tika if it already has one that was explicitly provided
(as opposed to the current behavior of letting hte add fail because of multiple values in
a single valued field).
> It seems pretty clear that in cases like this, the users intention is to have their one
literal field used as the value.
> http://old.nabble.com/Re%3A-WELCOME-to-solr-user%40lucene.apache.org-to26650071.html#a26650071

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message