lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Sokolov <soko...@ifactory.com>
Subject Re: How do I this in Solr?
Date Wed, 27 Oct 2010 13:43:40 GMT
Yes I missed that requirement (as Steven also pointed out in a private 
e-mail).  I now agree that the combinatorics are required.

Another possibility to consider (if the queries are large, which 
actually seems unlikely) is to use the default behavior where all terms 
are optional, sort by relevance, and truncate the result list on the 
client side after some unwanted term is found.  I *think* the scoring 
should find only docs with the searched-for terms first, although if 
there are a lot of repeated terms maybe not? Also result counts will be 
screwy.

-Mike

On 10/27/2010 09:34 AM, Toke Eskildsen wrote:
> That does not work either as it requires that all the terms in the query
> are present in the document. The original poster did not state this
> requirement. On the contrary, his examples were mostly single-word
> matches, implying an OR-search at the core.
>
> The query-explosion still seems like the only working idea. Maybe Varun
> could comment on the maximum numbers of terms that his queries will
> contain?
>
> Regards,
> Toke Eskildsen
>
> On Wed, 2010-10-27 at 15:02 +0200, Mike Sokolov wrote:
>    
>> Right - my point was to combine this with the previous approaches to
>> form a query like:
>>
>> samsung AND android AND GPS AND word_count:3
>>
>> in order to exclude documents containing additional words. This would
>> avoid the combinatoric explosion problem otehrs had alluded to earlier.
>> Of course this would fail because android is "mis-" spelled :)
>>
>> -Mike
>>
>> On 10/27/2010 08:45 AM, Steven A Rowe wrote:
>>      
>>> I'm pretty sure the word-count strategy won't work.
>>>
>>>
>>>        
>>>> If I search with the text "samsung andriod GPS", search results
>>>> should only conain "samsung", "GPS", "andriod" and "samsung andriod".
>>>>
>>>>          
>>> Using the word-count strategy, a document containing "samsung andriod PDQ" would
be a hit, but Varun doesn't want it, because it contains a word that is not in the query.
>>>
>>> Steve
>>>
>>>
>>>        
>>>> -----Original Message-----
>>>> From: Michael Sokolov [mailto:sokolov@ifactory.com]
>>>> Sent: Wednesday, October 27, 2010 7:44 AM
>>>> To: solr-user@lucene.apache.org
>>>> Subject: RE: How do I this in Solr?
>>>>
>>>> You might try adding a field containing the word count and making sure
>>>> that
>>>> matches the query's word count?
>>>>
>>>> This would require you to tokenize the query and document yourself,
>>>> perhaps.
>>>>
>>>> -Mike
>>>>
>>>>
>>>>          
>>>>> -----Original Message-----
>>>>> From: Varun Gupta [mailto:varun.vgupta@gmail.com]
>>>>> Sent: Tuesday, October 26, 2010 11:26 PM
>>>>> To: solr-user@lucene.apache.org
>>>>> Subject: Re: How do I this in Solr?
>>>>>
>>>>> Thanks everybody for the inputs.
>>>>>
>>>>> Looks like Steven's solution is the closest one but will lead
>>>>> to performance issues when the query string has many terms.
>>>>>
>>>>> I will try to implement the two filters suggested by Steven
>>>>> and see how the performance matches up.
>>>>>
>>>>> --
>>>>> Thanks
>>>>> Varun Gupta
>>>>>
>>>>>
>>>>> On Wed, Oct 27, 2010 at 8:04 AM, scott chu (???)
>>>>> <scott.chu@udngroup.com>wrote:
>>>>>
>>>>>
>>>>>            
>>>>>> I think you have to write a "yet exact match" handler
>>>>>>
>>>>>>              
>>>>> yourself (I mean
>>>>>
>>>>>            
>>>>>> yet cause it's not quite exact match we normally know).
>>>>>>
>>>>>>              
>>>>> Steve's answer
>>>>>
>>>>>            
>>>>>> is quite near your request. You can do further work based
>>>>>>
>>>>>>              
>>>>> on his solution.
>>>>>
>>>>>            
>>>>>> At the last step, I'll suggest you eat up all blank within query
>>>>>> string and query result, respevtively&   only returns those results
>>>>>> that has equal string length as the query string's.
>>>>>>
>>>>>> For example, giving:
>>>>>> *query string = "Samsung with GPS"
>>>>>> *query results:
>>>>>> resutl 1 = "Samsung has lots of mobile with GPS"
>>>>>> result 2 = "with GPS Samsng"
>>>>>> result 3 = "GPS mobile with vendors, such as Sony, Samsung"
>>>>>>
>>>>>> they become:
>>>>>> *query result = "SamsungwithGPS" (length =14) *query results:
>>>>>> resutl 1 = "SamsunghaslotsofmobilewithGPS" (length =29) result 2
=
>>>>>> "withGPSSamsng" (length =14) result 3 =
>>>>>> "GPSmobilewithvendors,suchasSony,Samsung" (length =43)
>>>>>>
>>>>>> so result 2 matches your request.
>>>>>>
>>>>>> In this way, you can avoid case-sensitive,
>>>>>>
>>>>>>              
>>>>> word-order-rearrange load
>>>>>
>>>>>            
>>>>>> of works. Furthermore, you can do refined work, such as
>>>>>>
>>>>>>              
>>>>> remove white
>>>>>
>>>>>            
>>>>>> characters, etc.
>>>>>>
>>>>>> Scott @ Taiwan
>>>>>>
>>>>>>
>>>>>> ----- Original Message ----- From: "Varun Gupta"
>>>>>> <varun.vgupta@gmail.com>
>>>>>>
>>>>>> To:<solr-user@lucene.apache.org>
>>>>>> Sent: Tuesday, October 26, 2010 9:07 PM
>>>>>>
>>>>>> Subject: How do I this in Solr?
>>>>>>
>>>>>>
>>>>>>    Hi,
>>>>>>
>>>>>>              
>>>>>>> I have lot of small documents (each containing 1 to 15
>>>>>>>
>>>>>>>                
>>>>> words) indexed
>>>>>
>>>>>            
>>>>>>> in Solr. For the search query, I want the search results
>>>>>>>
>>>>>>>                
>>>>> to contain
>>>>>
>>>>>            
>>>>>>> only those documents that satisfy this criteria "All of
>>>>>>>
>>>>>>>                
>>>>> the words of
>>>>>
>>>>>            
>>>>>>> the search result document are present in the search query"
>>>>>>>
>>>>>>> For example:
>>>>>>> If I have the following documents indexed: "nokia n95", "GPS",
>>>>>>> "android", "samsung", "samsung andriod", "nokia andriod",
>>>>>>>
>>>>>>>                
>>>>> "mobile with GPS"
>>>>>
>>>>>            
>>>>>>> If I search with the text "samsung andriod GPS", search results
>>>>>>> should only conain "samsung", "GPS", "andriod" and
>>>>>>>
>>>>>>>                
>>>>> "samsung andriod".
>>>>>
>>>>>            
>>>>>>> Is there a way to do this in Solr.
>>>>>>>
>>>>>>> --
>>>>>>> Thanks
>>>>>>> Varun Gupta
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>                
>>>>>>
>>>>>>
>>>>>>              
>>>>> ----------------------------------------------------------------------
>>>>>
>>>>>            
>>>>>> ----------
>>>>>>
>>>>>>
>>>>>>
>>>>>> %<&b6G$J0T.'$$'d(l/f,r!C
>>>>>> Checked by AVG - www.avg.com
>>>>>> Version: 9.0.862 / Virus Database: 271.1.1/3220 - Release Date:
>>>>>> 10/26/10 14:34:00
>>>>>>
>>>>>>
>>>>>>
>>>>>>              
>>>>>
>>>>>            
>>>
>>>        
>
>    

Mime
View raw message