lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rahul Babulal (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-3668) offsets issues with multiword synonyms
Date Sat, 21 Apr 2012 05:38:45 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-3668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13258775#comment-13258775
] 

Rahul Babulal commented on LUCENE-3668:
---------------------------------------

I'm using solr 3.6, and with luceneMatchVersion =3.6 in my solrconfig.xml I'm still seeing
issues with highlighting. However using luceneMatchVersion=3.3 fixes my issue.

Issue Details: 

In my synonyms if I have:
nhl, national hockey league 

If I index "Australian nhl team great" and 
search-use-case 1: search for "hockey" (without quotes) in my highlighted response snippets
I get "Australian nhl <em>team</em> great".
search-use-case 2: search for "league" (without quotes) in my highlighted response snippets
I get "Australian nhl team <em>great</em>".

Here is my feildType and field definitions:
<fieldType name="text_general" class="solr.TextField" positionIncrementGap="100">
      <analyzer type="index">
        <charFilter class="solr.HTMLStripCharFilterFactory"/>
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true"
expand="true"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"
enablePositionIncrements="true" />
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.PorterStemFilterFactory"/>        
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true"
expand="false"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"
enablePositionIncrements="true" />
		<filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.PorterStemFilterFactory"/>
      </analyzer>
    </fieldType>

<field name="description" type="text_synonym" indexed="true" stored="true"  termVectors="true"
termPositions="true"  termOffsets="true" omitNorms="false"/>
   
                
> offsets issues with multiword synonyms
> --------------------------------------
>
>                 Key: LUCENE-3668
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3668
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: modules/analysis
>            Reporter: Robert Muir
>            Assignee: Michael McCandless
>             Fix For: 3.6, 4.0
>
>         Attachments: LUCENE-3668.patch, LUCENE-3668_test.patch
>
>
> as reported on the list, there are some strange offsets with FSTSynonyms, in the case
of multiword synonyms.
> as a workaround it was suggested to use the older synonym impl, but it has bugs too (just
in a different way).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message