lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Colin Hebert (Updated) (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SOLR-3386) ExtractingRequestHandler applies fname settings to literals
Date Fri, 20 Apr 2012 15:10:40 GMT

     [ https://issues.apache.org/jira/browse/SOLR-3386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Colin Hebert updated SOLR-3386:
-------------------------------

    Description: 
The SolrContentHandler.addLiterals() method call the SolrContentHandler.addField() which itself
obtains the field with SolrContentHandler.findMappedName().

If this call makes sense with SolrContentHandler.addMetadata() [and others] because the user
can't set the name of the fields otherwise, with literals, the name of the field is manually
given by the user and it shouldn't be changed at all (maybe applying unknownFieldPrefix or
defaultField could be done, but even that doesn't seem quite normal).

----

I got this issue with the following usecase:

I have a schema containing a "title" field which is mandatory and contains only one value.
My documents have an internal title which is used as the value of the "title" field.
When sending one of these documents (and HTML document), if it contains a "title" metadata
I get an exception because I have multiple values for my "title" field (an exception I expect).
To fix that I used "fname.title=tika_title", so the title provided by tika is kept under another
name.
Both titles (the original one I pass manually, and the metadata one) are now stored in the
field "tika_title" and I get an exception because the "title" field hasn't been provided at
all.

----

An easy workaround for this bug is sending the literal named as "my_title", and adding the
following fnames "fname.my_title=title&fname.title=tika_title". A small swicheroo which
puts back the correct value in the expected field.

----

A way to fix that is extracting the first part of SolrContentHandler.addField() (lowerNames
and findMappedName()) in an external method (or put the lowerNames check in SolrContentHandler.findMappedName()
) and use that external method (or findMappedName() ) _before_ calling SolrContentHandler.addField()

  was:
The SolrContentHandler.addLiterals() method call the SolrContentHandler.addField() which itself
obtain the field with SolrContentHandler.findMappedName().

If this call makes sense with SolrContentHandler.addMetadata() [and others]  because the user
can't set the name of the fields otherwise, but with literals, the name of the field is manually
given by the user so it shouldn't be changed at all (maybe applying unknownFieldPrefix or
defaultField could be done, but even that doesn't seem quite normal).

----

I got this issue with the following usecase:

I have a schema containing a "title" field which is mandatory and contains only one value.
My documents have an internal title which is used as the value of the "title" field.
When sending one of these documents (and HTML document), if it contains a "title" metadata
I get an exception because I have multiple values for my "title" field (as I would expect).
To fix that I used "fname.title=tika_title", so the title provided by tika is kept under another
name.
Both titles (the original one I pass manually, and the metadata one) are now named "tika_title"
and I get an exception because "title" hasn't been provided at all.

----

An easy workaround for this bug is sending the literal as "my_title", and adding the following
fnames "fname.my_title=title&fname.title=tika_title". A small swicheroo which puts back
the correct value in the expected field.

----

A way to fix that is extracting the first blocks of SolrContentHandler.addField() in an external
method (or put the lowerNames check in SolrContentHandler.findMappedName() ) and use that
external method (or findMappedName() ) _before_ calling SolrContentHandler.addField()

    
> ExtractingRequestHandler applies fname settings to literals
> -----------------------------------------------------------
>
>                 Key: SOLR-3386
>                 URL: https://issues.apache.org/jira/browse/SOLR-3386
>             Project: Solr
>          Issue Type: Bug
>          Components: contrib - Solr Cell (Tika extraction)
>    Affects Versions: 3.5
>            Reporter: Colin Hebert
>            Priority: Minor
>
> The SolrContentHandler.addLiterals() method call the SolrContentHandler.addField() which
itself obtains the field with SolrContentHandler.findMappedName().
> If this call makes sense with SolrContentHandler.addMetadata() [and others] because the
user can't set the name of the fields otherwise, with literals, the name of the field is manually
given by the user and it shouldn't be changed at all (maybe applying unknownFieldPrefix or
defaultField could be done, but even that doesn't seem quite normal).
> ----
> I got this issue with the following usecase:
> I have a schema containing a "title" field which is mandatory and contains only one value.
> My documents have an internal title which is used as the value of the "title" field.
> When sending one of these documents (and HTML document), if it contains a "title" metadata
I get an exception because I have multiple values for my "title" field (an exception I expect).
> To fix that I used "fname.title=tika_title", so the title provided by tika is kept under
another name.
> Both titles (the original one I pass manually, and the metadata one) are now stored in
the field "tika_title" and I get an exception because the "title" field hasn't been provided
at all.
> ----
> An easy workaround for this bug is sending the literal named as "my_title", and adding
the following fnames "fname.my_title=title&fname.title=tika_title". A small swicheroo
which puts back the correct value in the expected field.
> ----
> A way to fix that is extracting the first part of SolrContentHandler.addField() (lowerNames
and findMappedName()) in an external method (or put the lowerNames check in SolrContentHandler.findMappedName()
) and use that external method (or findMappedName() ) _before_ calling SolrContentHandler.addField()

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message