lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jack Krupansky" <j...@basetechnology.com>
Subject Re: Help in resolving the below retrieval issue
Date Tue, 10 Sep 2013 20:20:02 GMT
Removing stray hyphens (embedded hyphens, like "CD-ROM", are okay) or 
escaping them with backslash looks like your best bests. There's no query 
parser option to disable the hyphen as an exlusion operator, although an 
upgrade to a "modern" Solr should fix the problem.

-- Jack Krupansky

-----Original Message----- 
From: Prathik Puthran
Sent: Tuesday, September 10, 2013 4:13 PM
To: solr-user@lucene.apache.org
Subject: Re: Help in resolving the below retrieval issue

I'm using Solr 3.4.


This bug is causing the 2nd term i.e. "kumar" to be treated as an exclusion
operator?
Is it possible to configure the query parser to not treat the '-' as
exclusion operator ?
If not the only way is to remove the '-' from the query string?

Thanks,
Prathik


On Tue, Sep 10, 2013 at 10:36 PM, Jack Krupansky 
<jack@basetechnology.com>wrote:

> What release of Solr are you using?
>
> It appears that the hyphen is being treated as an exclusion operator even
> though it is followed by a space. Solr 4.4 doesn't appear to do that, but
> maybe earlier releases had a problem.
>
> In any case, be careful with leading hyphen in queries since it does mean
> exclude documents that contain the following term.
>
> Or, just escape any leading hyphen with a backslash.
>
> -- Jack Krupansky
>
> -----Original Message----- From: Prathik Puthran
> Sent: Tuesday, September 10, 2013 11:47 AM
> To: dev@lucene.apache.org ; solr-user@lucene.apache.org
> Subject: Re: Help in resolving the below retrieval issue
>
>
> Thanks Erick for the response.
> I tried to debug the query. Below is the response in the debug node
>
> <str name="rawquerystring">Rahul - kumar</str><str 
> name="querystring">Rahul
> - kumar</str><str name="parsedquery">+text:Rahul -text:kumar</str><str
> name="parsedquery_toString">+**text:Rahul -text:kumar</str><lst
> name="explain"/><str name="QParser">LuceneQParser</**str><arr
> name="filter_queries"><str>**Rahul - kumar</str></arr><arr
> name="parsed_filter_queries"><**str>+text:rahul -text:kumar</str></arr>
>
>
> Does it mean the query parser has parsed it to tokens "Rahul -" and
> "kumar"?
> Even if this was the case solr should be able to retrieve the documents
> because I have indexed all the documents based on n-grams as well.
>
> Thanks,
> Prathik
>
>
> On Tue, Sep 10, 2013 at 7:09 PM, Erick Erickson <erickerickson@gmail.com>*
> *wrote:
>
>  Try adding &debug=query to the url. What I think you'll find is that
>> you're running into
>> a common issue, the difference between query parsing and analysis.
>>
>> when you submit anything with whitespace in it, the query parser will
>> break it up
>> _before_ it gets to the analysis part, you should see something in the
>> debug
>> portion of the query like
>> field:rahul field:kumar and possibly even field:-
>>
>> These are searched as separate tokens. By specifying KeywordTokenizer, at
>> index time you'll have exactly one token, rahul-kumar in the index which
>> will not
>> match any of the separated tokens
>>
>> Try escaping the spaces with backslash. You could also try quoting the
>> input although
>> that has some phrase implications.
>>
>> Do you really want this search to fail on just searching "rahul" though?
>> Perhaps
>> keywordTokenizer isn't best here, it depends upon your use-case...
>>
>> Best,
>> Erick
>>
>>
>> On Tue, Sep 10, 2013 at 8:10 AM, Prathik Puthran <
>> prathik.puthran87@gmail.com> wrote:
>>
>>  Hi,
>>>
>>> I am facing the below issue where in Solr is not retrieving the indexed
>>> word for some cases.
>>>
>>> This happens whenever the indexed word has string " - " (quotes for
>>> clarity) as substring i.e word prefix followed by a space which is
>>> followed
>>> by '-' again followed by a space and followed by the rest of the word
>>> suffix.
>>> When I search with search query being the exact string Solr returns no
>>> results.
>>>
>>> Example:
>>> Indexed word --> "Rahul - kumar"  (quotes for clarity)
>>> If I search with the search query as below Solr gives no results
>>> Search query --> "Rahul - kumar"  (quotes for clarity)
>>>
>>> However the below search query returns the results
>>> Search query --> "Rahul kumar"
>>>
>>> Can you please let me know what I am doing wrong here and what should I
>>> do to ensure the first query i.e. "Rahul - kumar" returns the documents
>>> indexed using it.
>>>
>>> Below are the analyzers I am using:
>>> Index time analyzer components:
>>> 1) <charFilter class="solr.**PatternReplaceCharFilterFactor**y"
>>> pattern="([^A-Za-z0-9 ])" replacement=""/>
>>>  2) <tokenizer class="solr.**KeywordTokenizerFactory"/>
>>>  3) <filter class="solr.**LowerCaseFilterFactory"/>
>>>  4) <filter class="solr.**WordDelimiterFilterFactory"
>>> generateWordParts="1"
>>> preserveOriginal="1"/>
>>>  5) <filter class="solr.**EdgeNGramFilterFactory" minGramSize="2"
>>> maxGramSize="50" side="front"/>
>>>  6) <filter class="solr.**EdgeNGramFilterFactory" minGramSize="2"
>>> maxGramSize="50" side="back"/>
>>>
>>> Query time analyzer components:
>>>  1) <charFilter class="solr.**PatternReplaceCharFilterFactor**y"
>>> pattern="([^A-Za-z0-9 ])" replacement=""/>
>>>  2) <tokenizer class="solr.**KeywordTokenizerFactory"/>
>>>  3) <filter class="solr.**LowerCaseFilterFactory"/>
>>>  4) <filter class="solr.**WordDelimiterFilterFactory"
>>> generateWordParts="1"
>>> preserveOriginal="1"/>
>>>
>>>
>>> Can you please let me know how I can fix this?
>>>
>>> Thanks,
>>> Prathik
>>>
>>>
>>>
>>
> 


Mime
View raw message