lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erik Hatcher (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (SOLR-11358) Support DelimitedTermFrequencyTokenFilter-using fields with payload() function
Date Thu, 17 May 2018 15:19:00 GMT

    [ https://issues.apache.org/jira/browse/SOLR-11358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16479196#comment-16479196
] 

Erik Hatcher edited comment on SOLR-11358 at 5/17/18 3:18 PM:
--------------------------------------------------------------

Coming back to this, and double-checking the test cases and implementation, I question whether
this is really useful, to have `payload()` return the same value that `termfreq()` would. 
 

At least let's add:

{{    <dynamicField name="*_dtf" type="delimited_term_frequency" indexed="true" stored="true"
omitPositions="true"/>}}

{{ }}{{    <fieldType name="delimited_term_frequency" stored="false" indexed="true"
class="solr.TextField">}}

{{      <analyzer>}}

{{        <tokenizer class="solr.WhitespaceTokenizerFactory"/>}}

{{        <filter class="solr.DelimitedTermFrequencyTokenFilterFactory"/>}}{{ 
    </analyzer>}}

{{    </fieldType>}}

to the default managed-schema.

I could see it being handy if you're testing the difference between *_dpi and *_dtf performance
and potentially toggling back and forth and want it to be transparent, but these delimited
tf fields aren't going to work as if they were truly payloaded with the payload scoring queries
currently.

Thoughts?   

 


was (Author: ehatcher):
Coming back to this, and double-checking the test cases and implementation, I question whether
this is really useful, to have `payload()` return the same value that `termfreq()` would. 
 

At least let's add:

{{    <dynamicField name="*_dtf" type="delimited_term_frequency" indexed="true" stored="true"
omitPositions="true"/>}}

{{ }}{{    <fieldType name="delimited_term_frequency" stored="false" indexed="true"
class="solr.TextField">}}

{{      <analyzer>}}

{{        <tokenizer class="solr.WhitespaceTokenizerFactory"/>}}{{         
}}

{{        <filter class="solr.DelimitedTermFrequencyTokenFilterFactory"/>}}{{ 
    </analyzer>}}

{{    </fieldType>}}

to the default managed-schema.

I could see it being handy if you're testing the difference between *_dpi and *_dtf performance
and potentially toggling back and forth and want it to be transparent, but these delimited
tf fields aren't going to work as if they were truly payloaded with the payload scoring queries
currently.

Thoughts?   

 

> Support DelimitedTermFrequencyTokenFilter-using fields with payload() function
> ------------------------------------------------------------------------------
>
>                 Key: SOLR-11358
>                 URL: https://issues.apache.org/jira/browse/SOLR-11358
>             Project: Solr
>          Issue Type: New Feature
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Erik Hatcher
>            Assignee: Erik Hatcher
>            Priority: Major
>         Attachments: SOLR-11358.patch
>
>
> payload() works values encoded with DelimitedPayloadTokenFilter.   payload() can be modified
to return the term frequency instead, when the field uses DelimitedTermFrequencyTokenFilter.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message