lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andy Lester (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SOLR-2890) omitTermFreqAndPositions should be specifiable on fieldType
Date Mon, 12 Nov 2012 22:53:12 GMT

    [ https://issues.apache.org/jira/browse/SOLR-2890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13495717#comment-13495717
] 

Andy Lester commented on SOLR-2890:
-----------------------------------

I believe this is a Bug, not an Improvement, and that it is not Minor.

The docs at http://wiki.apache.org/solr/SchemaXml explicitly state that "Common options that
field types can have are..." and lists omitTermFreqAndPositions.

In my case, I created a custom type for ISBNs specified like so:

        <fieldType name="isbn" class="solr.TextField" stored="true" sortMissingLast="true"
omitNorms="true" omitTermFreqAndPositions="true">
            <analyzer>
                <!-- Remove anything not a digit or X -->
                <charFilter class="solr.PatternReplaceCharFilterFactory"
                    pattern="[^0-9Xx]"
                    replacement=""
                    replace="all"
                    />
                <tokenizer class="solr.KeywordTokenizerFactory" />
                <filter class="solr.LowerCaseFilterFactory"/>
            </analyzer>
        </fieldType>
        

with the field definition like so:

        <field name="isbn" type="isbn" omitTermFreqAndPositions="true" multiValued="true"
/>

It was surprising, then, to find that my core's index directory had 600MB of *.prx files,
when there should not be any position information anywhere in the core.

When I then updated the field definition to:

        <field name="isbn" type="isbn" omitTermFreqAndPositions="true" multiValued="true"
/>

and reindexed the core, the *.prx files were no longer created.

Based on David Smiley's reading of the code at in TextField.java, the culprit seems to be:

    if (schema.getVersion()> 1.1f) properties &= ~OMIT_TF_POSITIONS;

which is at least reassuring that omitNorms and omitPositions seem to be unchanged.

The fix to this could be as simple as changing the wiki to state that omitTermFreqAndPositions
must be specified at the field level.
                
> omitTermFreqAndPositions should be specifiable on fieldType
> -----------------------------------------------------------
>
>                 Key: SOLR-2890
>                 URL: https://issues.apache.org/jira/browse/SOLR-2890
>             Project: Solr
>          Issue Type: Improvement
>          Components: Schema and Analysis
>    Affects Versions: 3.4
>            Reporter: David Smiley
>            Priority: Minor
>
> Setting omitTermFreqAndPositions="true" doesn't work when I put it on a fieldType definition
for my text field.  It did work when I put it on the field definition.  I think this option
and probably all options should be settable at the fieldType level.  I did some investigation
and found that the value of this option was being reset on line 54 of TextField.
> FYI I am trying to put this on a field type for use by the SpellCheck component which
has no use for term frequencies and positions from the source field.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message