lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Prathik Puthran <prathik.puthra...@gmail.com>
Subject Re: Help in resolving the below retrieval issue
Date Tue, 10 Sep 2013 15:47:47 GMT
Thanks Erick for the response.
I tried to debug the query. Below is the response in the debug node

<str name="rawquerystring">Rahul - kumar</str><str name="querystring">Rahul
- kumar</str><str name="parsedquery">+text:Rahul -text:kumar</str><str
name="parsedquery_toString">+text:Rahul -text:kumar</str><lst
name="explain"/><str name="QParser">LuceneQParser</str><arr
name="filter_queries"><str>Rahul - kumar</str></arr><arr
name="parsed_filter_queries"><str>+text:rahul -text:kumar</str></arr>


Does it mean the query parser has parsed it to tokens "Rahul -" and "kumar"?
Even if this was the case solr should be able to retrieve the documents
because I have indexed all the documents based on n-grams as well.

Thanks,
Prathik


On Tue, Sep 10, 2013 at 7:09 PM, Erick Erickson <erickerickson@gmail.com>wrote:

> Try adding &debug=query to the url. What I think you'll find is that
> you're running into
> a common issue, the difference between query parsing and analysis.
>
> when you submit anything with whitespace in it, the query parser will
> break it up
> _before_ it gets to the analysis part, you should see something in the
> debug
> portion of the query like
> field:rahul field:kumar and possibly even field:-
>
> These are searched as separate tokens. By specifying KeywordTokenizer, at
> index time you'll have exactly one token, rahul-kumar in the index which
> will not
> match any of the separated tokens
>
> Try escaping the spaces with backslash. You could also try quoting the
> input although
> that has some phrase implications.
>
> Do you really want this search to fail on just searching "rahul" though?
> Perhaps
> keywordTokenizer isn't best here, it depends upon your use-case...
>
> Best,
> Erick
>
>
> On Tue, Sep 10, 2013 at 8:10 AM, Prathik Puthran <
> prathik.puthran87@gmail.com> wrote:
>
>> Hi,
>>
>> I am facing the below issue where in Solr is not retrieving the indexed
>> word for some cases.
>>
>> This happens whenever the indexed word has string " - " (quotes for
>> clarity) as substring i.e word prefix followed by a space which is followed
>> by '-' again followed by a space and followed by the rest of the word
>> suffix.
>> When I search with search query being the exact string Solr returns no
>> results.
>>
>> Example:
>> Indexed word --> "Rahul - kumar"  (quotes for clarity)
>> If I search with the search query as below Solr gives no results
>> Search query --> "Rahul - kumar"  (quotes for clarity)
>>
>> However the below search query returns the results
>> Search query --> "Rahul kumar"
>>
>> Can you please let me know what I am doing wrong here and what should I
>> do to ensure the first query i.e. "Rahul - kumar" returns the documents
>> indexed using it.
>>
>> Below are the analyzers I am using:
>> Index time analyzer components:
>> 1) <charFilter class="solr.PatternReplaceCharFilterFactory"
>> pattern="([^A-Za-z0-9 ])" replacement=""/>
>>  2) <tokenizer class="solr.KeywordTokenizerFactory"/>
>>  3) <filter class="solr.LowerCaseFilterFactory"/>
>>  4) <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
>> preserveOriginal="1"/>
>>  5) <filter class="solr.EdgeNGramFilterFactory" minGramSize="2"
>> maxGramSize="50" side="front"/>
>>  6) <filter class="solr.EdgeNGramFilterFactory" minGramSize="2"
>> maxGramSize="50" side="back"/>
>>
>> Query time analyzer components:
>>  1) <charFilter class="solr.PatternReplaceCharFilterFactory"
>> pattern="([^A-Za-z0-9 ])" replacement=""/>
>>  2) <tokenizer class="solr.KeywordTokenizerFactory"/>
>>  3) <filter class="solr.LowerCaseFilterFactory"/>
>>  4) <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
>> preserveOriginal="1"/>
>>
>>
>> Can you please let me know how I can fix this?
>>
>> Thanks,
>> Prathik
>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message