lucene-solr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Peter Wolanin (JIRA)" <j...@apache.org>
Subject [jira] Updated: (SOLR-1852) enablePositionIncrements="true" can cause searches to fail when they are parsed as phrase queries
Date Sat, 27 Mar 2010 23:51:27 GMT

     [ https://issues.apache.org/jira/browse/SOLR-1852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Peter Wolanin updated SOLR-1852:
--------------------------------

    Description: 
Symptom: searching for a string like a domain name containing a '.', the Solr 1.4 analyzer
tells me that I will get a match, but when I enter the search either in the client or directly
in Solr, the search fails. 
test string:  Identi.ca

queries that fail:  IdentiCa, Identi.ca, Identi-ca

query that matches: Identi ca


schema in use is:
http://drupalcode.org/viewvc/drupal/contributions/modules/apachesolr/schema.xml?revision=1.1.2.1.2.34&content-type=text%2Fplain&view=co&pathrev=DRUPAL-6--1

Screen shots:

analysis:  http://img.skitch.com/20100327-nt1uc1ctykgny28n8bgu99h923.png
dismax search: http://img.skitch.com/20100327-byiduuiry78caka7q5smsw7fp.png
dismax search: http://img.skitch.com/20100327-gckm8uhjx3t7px31ygfqc2ugdq.png
standard search: http://img.skitch.com/20100327-usqyqju1d12ymcpb2cfbtdwyh.png

Whether or not the bug appears is determined by the surrounding text:

"would be great to have support for Identi.ca on the follow block"

fails to match "Identi.ca", but putting the content on its own or in another sentence:

"Support Identi.ca"

the search matches.  Testing suggests the word "for" is the problem, and it looks like the
bug occurs when a stop word preceeds a word that is split up using the whitespace delimiter.

Setting enablePositionIncrements="false" in the stop filter and reindexing causes the searches
to match.


According to Mark Miller in #solr, this bug appears to be fixed already in Solr trunk, either
due to the upgraded lucene or changes to the WordDelimiterFactor


  was:
Symptom: searching for a string like a domain
name containing a '.', the Solr 1.4 analyzer tells me that I will get
a match, but when I enter the search either in the client or directly
in Solr, the search fails.  Our default handler is dismax, but this
also fails with the standard handler.  So I'm wondering if this is a
known issue, or am I missing something subtle in the analysis chain?
Solr is 1.4.0 that I built.

test string:  Identi.ca

queries that fail:  IdentiCa, Identi.ca, Identi-ca

query that matches: Identi ca


schema in use is:
http://drupalcode.org/viewvc/drupal/contributions/modules/apachesolr/schema.xml?revision=1.1.2.1.2.34&content-type=text%2Fplain&view=co&pathrev=DRUPAL-6--1

Screen shots:

analysis:  http://img.skitch.com/20100327-nt1uc1ctykgny28n8bgu99h923.png
dismax search: http://img.skitch.com/20100327-byiduuiry78caka7q5smsw7fp.png
dismax search: http://img.skitch.com/20100327-gckm8uhjx3t7px31ygfqc2ugdq.png
standard search: http://img.skitch.com/20100327-usqyqju1d12ymcpb2cfbtdwyh.png

Setting enablePositionIncrements="false" in the stop filter and reindexing causes the searches
to match.

According to Mark Miller in #solr, this bug appears to be fixed already in Solr trunk, either
due to the upgraded lucene or changes to the WordDelimiterFactor


        Summary: enablePositionIncrements="true" can cause searches to fail when they are
parsed as phrase queries  (was: enablePositionIncrements="true" causes searches to fail when
they are parse as phrase queries)

> enablePositionIncrements="true" can cause searches to fail when they are parsed as phrase
queries
> -------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-1852
>                 URL: https://issues.apache.org/jira/browse/SOLR-1852
>             Project: Solr
>          Issue Type: Bug
>    Affects Versions: 1.4
>            Reporter: Peter Wolanin
>         Attachments: SOLR-1852.patch
>
>
> Symptom: searching for a string like a domain name containing a '.', the Solr 1.4 analyzer
tells me that I will get a match, but when I enter the search either in the client or directly
in Solr, the search fails. 
> test string:  Identi.ca
> queries that fail:  IdentiCa, Identi.ca, Identi-ca
> query that matches: Identi ca
> schema in use is:
> http://drupalcode.org/viewvc/drupal/contributions/modules/apachesolr/schema.xml?revision=1.1.2.1.2.34&content-type=text%2Fplain&view=co&pathrev=DRUPAL-6--1
> Screen shots:
> analysis:  http://img.skitch.com/20100327-nt1uc1ctykgny28n8bgu99h923.png
> dismax search: http://img.skitch.com/20100327-byiduuiry78caka7q5smsw7fp.png
> dismax search: http://img.skitch.com/20100327-gckm8uhjx3t7px31ygfqc2ugdq.png
> standard search: http://img.skitch.com/20100327-usqyqju1d12ymcpb2cfbtdwyh.png
> Whether or not the bug appears is determined by the surrounding text:
> "would be great to have support for Identi.ca on the follow block"
> fails to match "Identi.ca", but putting the content on its own or in another sentence:
> "Support Identi.ca"
> the search matches.  Testing suggests the word "for" is the problem, and it looks like
the bug occurs when a stop word preceeds a word that is split up using the whitespace delimiter.
> Setting enablePositionIncrements="false" in the stop filter and reindexing causes the
searches to match.
> According to Mark Miller in #solr, this bug appears to be fixed already in Solr trunk,
either due to the upgraded lucene or changes to the WordDelimiterFactor

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message