lucene-solr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From anuvenk <anuvenkat...@hotmail.com>
Subject RE: Dealing with numbers in search terms
Date Fri, 04 Jan 2008 21:17:33 GMT

We have many documents in the index which includes some faqs and forms that
have some occurrences of 'chapter 7' in the document ( mostly as a single
phrase). And i have these synonym mappings..i kind of get a feeling that i
have too many redundant synonyms like the ones below..
chap 7 => bankruptcy
chapter => bankruptcy
chap => chapter
chapter 7 => bankruptcy
bankrupcy => bankruptcy
chap,7,chap7,chapter 7,chapter 7 bankruptcy,chap 7

Since i'm new to solr...still learning how its working.

Here is the parsedquery_toString

<str name="parsedquery_toString">
+(text:"(bankruptci chap 7) (7 chapter chap) 7 bankruptci"^0.8 |
((name:bankruptci name:chap)^2.0))~0.01 (text:"(bankruptci chap 7) (7
chapter chap) 7 bankruptci"~50^0.8 | ((name:bankruptci name:chap)^2.0))~0.01
</str>

Here is a portion of my request handler
     <float name="tie">0.01</float>
     <str name="qf">text^0.8 name^2.0</str>
     <!-- until 3 all should match;4 - 3 shld match; 5 - 4 shld match; 6 - 5
shld match; above 6 - 90% match -->
     <str name="mm">3&lt;-1 4&lt;-1 5&lt;-1 6&lt;90%</str>
     <str name="pf">
         text^0.8 name^2.0
     </str>
     <int name="ps">50</int>

So for 'chapter 7' search term i was expecting solr to return all documents
that have both 'chapter' and '7' in the document. But its puzzling why it
was returning some documents that just have the number 7 in them.

It'll be very helpful if i can get some explanation on this behaviour.

Also could you elaborate on what you mean by maximizing the conjunction of
the query and document term spaces.?



Its puzzling to me why some documents that just have the number 7 

Steven Rowe wrote:
> 
> Hi anuvenk,
> 
> On 01/03/2008 at 9:20 PM, anuvenk wrote:
>> I'm facing a crucial problem with numbers in the search terms for eg:
>> searching for chapter 7 returns a couple of results that are not related
>> to chapter 7 bankruptcy. The first result i get only has a match for the
>> number 7 which i should get rid of somehow. would adding a synonym like
>> 7 => bankruptcy or 7 => chapter 7 help?
> 
> This doesn't sound to me like a problem with numbers.  This sounds like a
> problem with terms that are used in more than one context, and that also
> happen to be numbers.
> 
> What you do to fix the problem really depends on what your use cases are. 
> Is "chapter 7" a raw user query?  How did you populate your index?  Is
> "chapter 7" a single term in the index?  One of the keys to successful
> search is maximizing the conjunction of the query and document term
> spaces.
> 
> Have you tried searching for "chapter 7" as a phrase?
> 
> Steve
> 
> 

-- 
View this message in context: http://www.nabble.com/Dealing-with-numbers-in-search-terms-tp14609699p14624992.html
Sent from the Solr - Dev mailing list archive at Nabble.com.


Mime
View raw message