lucene-solr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Muir (JIRA)" <>
Subject [jira] Updated: (SOLR-1852) enablePositionIncrements="true" can cause searches to fail when they are parsed as phrase queries
Date Sun, 28 Mar 2010 13:03:27 GMT


Robert Muir updated SOLR-1852:

    Attachment: SOLR-1852_testcase.patch

attached is a testcase demonstrating the bug.

The problem is that if you have, for example "the lucene.solr", where "the" is a stopword,
the Solr 1.4 WordDelimiter bumps the position increment of *both* "lucene" and "solr" tokens:

* lucene (posInc=2)
* solr (posInc=2)
* lucenesolr (posInc=0)

Instead it should look like:

* lucene (posInc=2)
* solr (posInc=1)
* lucenesolr (posInc=0)

In my opinion the behavior of trunk is correct, and this is a bug. 
But I don't know how to fix just Solr 1.4's WDF in a better way than dropping in the entire
rewritten WDF...

> enablePositionIncrements="true" can cause searches to fail when they are parsed as phrase
> -------------------------------------------------------------------------------------------------
>                 Key: SOLR-1852
>                 URL:
>             Project: Solr
>          Issue Type: Bug
>    Affects Versions: 1.4
>            Reporter: Peter Wolanin
>         Attachments: SOLR-1852.patch, SOLR-1852_testcase.patch
> Symptom: searching for a string like a domain name containing a '.', the Solr 1.4 analyzer
tells me that I will get a match, but when I enter the search either in the client or directly
in Solr, the search fails. 
> test string:
> queries that fail:  IdentiCa,, Identi-ca
> query that matches: Identi ca
> schema in use is:
> Screen shots:
> analysis:
> dismax search:
> dismax search:
> standard search:
> Whether or not the bug appears is determined by the surrounding text:
> "would be great to have support for on the follow block"
> fails to match "", but putting the content on its own or in another sentence:
> "Support"
> the search matches.  Testing suggests the word "for" is the problem, and it looks like
the bug occurs when a stop word preceeds a word that is split up using the word delimiter
> Setting enablePositionIncrements="false" in the stop filter and reindexing causes the
searches to match.
> According to Mark Miller in #solr, this bug appears to be fixed already in Solr trunk,
either due to the upgraded lucene or changes to the WordDelimiterFactory

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message