lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jack Krupansky (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SOLR-4859) MinFieldValueUpdateProcessorFactory and MaxFieldValueUpdateProcessorFactory don't do numeric comparison for numeric fields
Date Mon, 03 Jun 2013 16:35:25 GMT

    [ https://issues.apache.org/jira/browse/SOLR-4859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13673289#comment-13673289
] 

Jack Krupansky commented on SOLR-4859:
--------------------------------------

bq. I have a patch to fix JsonLoader 

I'm glad to hear that, but we still have the XML loader, CSV loader, and SolrCell.

I would prefer a fix to the mutating field processor code itself. I mean, how un-obvious is
it that a processor labeled "Min/Max Field Value" should be able to handle numeric string
values when (if) the type of the field is known? Clearly a bad design was chosen. Why can't
we correct that bad design decision and eliminate the need for workaround approaches like
requiring users to explicitly convert types?

But... if this really bad design decision really is cast in concrete... any ETA on the explicit
conversion processors? As well as an update to the Javadoc to highlight the pitfalls of min/max
for non-JSON field values.
                
> MinFieldValueUpdateProcessorFactory and MaxFieldValueUpdateProcessorFactory don't do
numeric comparison for numeric fields
> --------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-4859
>                 URL: https://issues.apache.org/jira/browse/SOLR-4859
>             Project: Solr
>          Issue Type: Bug
>          Components: update
>    Affects Versions: 4.3
>            Reporter: Jack Krupansky
>
> MinFieldValueUpdateProcessorFactory and MaxFieldValueUpdateProcessorFactory are advertised
as supporting numeric comparisons, but this doesn't work - only string comparison is available
- and doesn't seem possible, although the unit tests show it is possible at the unit test
level.
> The problem is that numeric processing is dependent on the SolrInputDocument containing
a list of numeric values, but at least with both the current XML and JSON loaders, only string
values can be loaded.
> Test scenario.
> 1. Use Solr 4.3 example.
> 2. Add following update processor chain to solrconfig:
> {code}
>   <updateRequestProcessorChain name="max-only-num">
>     <processor class="solr.MaxFieldValueUpdateProcessorFactory">
>       <str name="fieldName">sizes_i</str>
>     </processor>
>     <processor class="solr.LogUpdateProcessorFactory" />
>     <processor class="solr.RunUpdateProcessorFactory" />
>   </updateRequestProcessorChain>
> {code}
> 3. Perform this update request:
> {code}
>   curl "http://localhost:8983/solr/update?commit=true&update.chain=max-only-num"
\
>   -H 'Content-type:application/json' -d '
>   [{"id": "doc-1",
>     "title_s": "Hello World",
>     "sizes_i": [200, 999, 101, 199, 1000]}]'
> {code}
> Note that the values are JSON integer values.
> 4. Perform this query:
> {code}
> curl "http://localhost:8983/solr/select/?q=*:*&indent=true&wt=json"
> {code}
> Shows this result:
> {code}
>   "response":{"numFound":1,"start":0,"docs":[
>       {
>         "id":"doc-1",
>         "title_s":"Hello World",
>         "sizes_i":999,
>         "_version_":1436094187405574144}]
>   }}
> {code}
> sizes_i should be 1000, not 999.
> Alternative update tests:
> {code}
>   curl "http://localhost:8983/solr/update?commit=true&update.chain=max-only-num"
\
>   -H 'Content-type:application/json' -d '
>   [{"id": "doc-1",
>     "title_s": "Hello World",
>     "sizes_i": 200,
>     "sizes_i": 999,
>     "sizes_i": 101,
>     "sizes_i": 199,
>     "sizes_i": 1000}]'
> {code}
> and
> {code}
>   curl "http://localhost:8983/solr/update?commit=true&update.chain=max-only-num"
\
>   -H 'Content-type:application/xml' -d '
>   <add>
>     <doc>
>       <field name="id">doc-1</field>
>       <field name="title_s">Hello World</field>
>       <field name="sizes_i">42</field>
>       <field name="sizes_i">128</field>
>       <field name="sizes_i">-3</field>
>     </doc>
>   </add>'
> {code}
> In XML, of course, there is no way for the input values to be anything other than strings
("text".)
> The JSON loader does parse the values with their type, but immediately converts the values
to strings:
> {code}
>     private Object parseSingleFieldValue(int ev) throws IOException {
>       switch (ev) {
>         case JSONParser.STRING:
>           return parser.getString();
>         case JSONParser.LONG:
>         case JSONParser.NUMBER:
>         case JSONParser.BIGNUMBER:
>           return parser.getNumberChars().toString();
>         case JSONParser.BOOLEAN:
>           return Boolean.toString(parser.getBoolean()); // for legacy reasons, single
values s are expected to be strings
>         case JSONParser.NULL:
>           parser.getNull();
>           return null;
>         case JSONParser.ARRAY_START:
>           return parseArrayFieldValue(ev);
>         default:
>           throw new SolrException(SolrException.ErrorCode.BAD_REQUEST, "Error parsing
JSON field value. Unexpected "+JSONParser.getEventString(ev) );
>       }
>     }
>     private List<Object> parseArrayFieldValue(int ev) throws IOException {
>       assert ev == JSONParser.ARRAY_START;
>   
>       ArrayList lst = new ArrayList(2);
>       for (;;) {
>         ev = parser.nextEvent();
>         if (ev == JSONParser.ARRAY_END) {
>           return lst;
>         }
>         Object val = parseSingleFieldValue(ev);
>         lst.add(val);
>       }
>     }
>   }
> {code}
> Originally, I had hoped/expected that the schema type of the field would determine the
type of min/max comparison - integer for a *_i field in my case.
> The comparison logic for min:
> {code}
> public final class MinFieldValueUpdateProcessorFactory extends FieldValueSubsetUpdateProcessorFactory
{
>   @Override
>   @SuppressWarnings("unchecked")
>   public Collection pickSubset(Collection values) {
>     Collection result = values;
>     try {
>       result = Collections.singletonList
>         (Collections.min(values));
>     } catch (ClassCastException e) {
>       throw new SolrException
>         (BAD_REQUEST, 
>          "Field values are not mutually comparable: " + e.getMessage(), e);
>     }
>     return result;
>   }
> {code}
> Which seems to be completely dependent only on the type of the input values, not the
field type itself.
> It would be nice to at least have a comparison override: compareNumeric="true".

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message