lucene-solr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Peter Wolanin (JIRA)" <j...@apache.org>
Subject [jira] Updated: (SOLR-1852) enablePositionIncrements="true" can cause searches to fail when they are parsed as phrase queries
Date Sat, 27 Mar 2010 23:53:27 GMT

     [ https://issues.apache.org/jira/browse/SOLR-1852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Peter Wolanin updated SOLR-1852:
--------------------------------

    Description: 
Symptom: searching for a string like a domain name containing a '.', the Solr 1.4 analyzer
tells me that I will get a match, but when I enter the search either in the client or directly
in Solr, the search fails. 
test string:  Identi.ca

queries that fail:  IdentiCa, Identi.ca, Identi-ca

query that matches: Identi ca


schema in use is:
http://drupalcode.org/viewvc/drupal/contributions/modules/apachesolr/schema.xml?revision=1.1.2.1.2.34&content-type=text%2Fplain&view=co&pathrev=DRUPAL-6--1

Screen shots:

analysis:  http://img.skitch.com/20100327-nt1uc1ctykgny28n8bgu99h923.png
dismax search: http://img.skitch.com/20100327-byiduuiry78caka7q5smsw7fp.png
dismax search: http://img.skitch.com/20100327-gckm8uhjx3t7px31ygfqc2ugdq.png
standard search: http://img.skitch.com/20100327-usqyqju1d12ymcpb2cfbtdwyh.png

Whether or not the bug appears is determined by the surrounding text:

"would be great to have support for Identi.ca on the follow block"

fails to match "Identi.ca", but putting the content on its own or in another sentence:

"Support Identi.ca"

the search matches.  Testing suggests the word "for" is the problem, and it looks like the
bug occurs when a stop word preceeds a word that is split up using the word delimiter filter.

Setting enablePositionIncrements="false" in the stop filter and reindexing causes the searches
to match.


According to Mark Miller in #solr, this bug appears to be fixed already in Solr trunk, either
due to the upgraded lucene or changes to the WordDelimiterFactory


  was:
Symptom: searching for a string like a domain name containing a '.', the Solr 1.4 analyzer
tells me that I will get a match, but when I enter the search either in the client or directly
in Solr, the search fails. 
test string:  Identi.ca

queries that fail:  IdentiCa, Identi.ca, Identi-ca

query that matches: Identi ca


schema in use is:
http://drupalcode.org/viewvc/drupal/contributions/modules/apachesolr/schema.xml?revision=1.1.2.1.2.34&content-type=text%2Fplain&view=co&pathrev=DRUPAL-6--1

Screen shots:

analysis:  http://img.skitch.com/20100327-nt1uc1ctykgny28n8bgu99h923.png
dismax search: http://img.skitch.com/20100327-byiduuiry78caka7q5smsw7fp.png
dismax search: http://img.skitch.com/20100327-gckm8uhjx3t7px31ygfqc2ugdq.png
standard search: http://img.skitch.com/20100327-usqyqju1d12ymcpb2cfbtdwyh.png

Whether or not the bug appears is determined by the surrounding text:

"would be great to have support for Identi.ca on the follow block"

fails to match "Identi.ca", but putting the content on its own or in another sentence:

"Support Identi.ca"

the search matches.  Testing suggests the word "for" is the problem, and it looks like the
bug occurs when a stop word preceeds a word that is split up using the whitespace delimiter.

Setting enablePositionIncrements="false" in the stop filter and reindexing causes the searches
to match.


According to Mark Miller in #solr, this bug appears to be fixed already in Solr trunk, either
due to the upgraded lucene or changes to the WordDelimiterFactor



> enablePositionIncrements="true" can cause searches to fail when they are parsed as phrase
queries
> -------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-1852
>                 URL: https://issues.apache.org/jira/browse/SOLR-1852
>             Project: Solr
>          Issue Type: Bug
>    Affects Versions: 1.4
>            Reporter: Peter Wolanin
>         Attachments: SOLR-1852.patch
>
>
> Symptom: searching for a string like a domain name containing a '.', the Solr 1.4 analyzer
tells me that I will get a match, but when I enter the search either in the client or directly
in Solr, the search fails. 
> test string:  Identi.ca
> queries that fail:  IdentiCa, Identi.ca, Identi-ca
> query that matches: Identi ca
> schema in use is:
> http://drupalcode.org/viewvc/drupal/contributions/modules/apachesolr/schema.xml?revision=1.1.2.1.2.34&content-type=text%2Fplain&view=co&pathrev=DRUPAL-6--1
> Screen shots:
> analysis:  http://img.skitch.com/20100327-nt1uc1ctykgny28n8bgu99h923.png
> dismax search: http://img.skitch.com/20100327-byiduuiry78caka7q5smsw7fp.png
> dismax search: http://img.skitch.com/20100327-gckm8uhjx3t7px31ygfqc2ugdq.png
> standard search: http://img.skitch.com/20100327-usqyqju1d12ymcpb2cfbtdwyh.png
> Whether or not the bug appears is determined by the surrounding text:
> "would be great to have support for Identi.ca on the follow block"
> fails to match "Identi.ca", but putting the content on its own or in another sentence:
> "Support Identi.ca"
> the search matches.  Testing suggests the word "for" is the problem, and it looks like
the bug occurs when a stop word preceeds a word that is split up using the word delimiter
filter.
> Setting enablePositionIncrements="false" in the stop filter and reindexing causes the
searches to match.
> According to Mark Miller in #solr, this bug appears to be fixed already in Solr trunk,
either due to the upgraded lucene or changes to the WordDelimiterFactory

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message