lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From theDude_2 <aornst...@webmd.net>
Subject Re: A Challenge!: Combining 2 searches into a single resultset?
Date Mon, 20 Apr 2009 13:57:46 GMT

Thanks!

I wound up indexing both versions in the same index, and boosting the words
that appeared in the "good word" list!

Thanks again for your advice!


Matthew Hall-7 wrote:
> 
> Erm, I likely should have mentioned that this technique requires the use 
> of a MultiFieldQueryParser.
> 
> Matt
> 
> Matthew Hall wrote:
>> If you can build an analyzer that tokenizes the second field so that 
>> it filters out the words you don't want, you can then take advantage 
>> of more intelligent queries as well.
>>
>>
>> So for the example that pjaol wrote, the query would become something 
>> like this:
>>
>> Query= body:(game OR redskins) keyword:(redskins)^10
>>
>> Depending on your corpus this may or may not be possible, the 
>> determining factor being whether or not the list of words being 
>> removed from each document to create the second field varies.
>>
>> (more specifically I mean, in some documents you remove the word game, 
>> and in others you don't, if this is the case this technique won't work 
>> for you.)
>>
>> Matt
>>
>>
>>
>> theDude_2 wrote:
>>> Ah, Interesting... I didnt think of that!  I will try it and report back
>>>
>>>
>>> pjaol wrote:
>>>  
>>>> Why not put the keywords into the same document as another field? and
>>>> search
>>>> both fields
>>>> at once, you can then use lucene syntax to give a boosting to the 
>>>> keyword
>>>> fields.
>>>> e.g.
>>>> body:A good game last night by the redskins
>>>> keyword: redskins
>>>>
>>>> Query= body:(game OR redskins) keyword:(game OR redskins)^10
>>>>
>>>> And adjust the boosting until you're happy.
>>>> Check out for querying multiple fields
>>>> http://wiki.apache.org/lucene-java/LuceneFAQ#head-300f0756fdaa71f522c96a868351f716573f2d77

>>>>
>>>>
>>>> You might even want to consider Solr and it's dismax search component
>>>> http://wiki.apache.org/solr/DisMaxRequestHandler
>>>> to make it easier
>>>>
>>>>
>>>>
>>>>
>>>> On Fri, Apr 17, 2009 at 11:19 AM, theDude_2 <aornstein@webmd.net> 
>>>> wrote:
>>>>
>>>>    
>>>>> I appreciate your response, and read the wiki article concerning the
>>>>> Federated search
>>>>> and
>>>>>
>>>>> I'm not sure that my project falls into the "Federated Search" 
>>>>> bucket...
>>>>>
>>>>> What I've done is created 2 indexes created with the same documents.
>>>>> One index, contains the full documents - great for pure relevancy 
>>>>> search
>>>>> The second index: contains all of the same documents, but a small 
>>>>> subset
>>>>> of
>>>>> each documents contents - only allowing words to be indexed that we 
>>>>> deem
>>>>> as
>>>>> "good words" -
>>>>>
>>>>> (for example) if this was a football article database
>>>>> Index 1: would index 100% of the article about the Redskins and the 
>>>>> New
>>>>> York
>>>>> Giants
>>>>> Index 2: would index the same article by only the "good words" in the
>>>>> document like Redskins, Giants, Quarterback, Linebacker, etc.
>>>>>
>>>>> What I'm trying to do, if it's even possible! is run the search on 
>>>>> both
>>>>> indexes containing references to the same article, and multiple the
>>>>> scores
>>>>> together to get a final score that would represent something like a
>>>>> "relative AND good word" score....
>>>>>
>>>>> Figuring that if a user searches on "Who is the Quarterback for the
>>>>> Giants"
>>>>> this will get the user an article that is both related to the 
>>>>> query, and
>>>>> deemed "important" to the query...
>>>>>
>>>>> I will look further into federated search and related items, but I 
>>>>> think
>>>>> that lucene probably wont be able to help me with this, am I right?
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> ------------
>>>>>
>>>>> pjaol wrote:
>>>>>      
>>>>>> I'd start by doing some research on the question rather than 
>>>>>> asking for
>>>>>>         
>>>>> a
>>>>>      
>>>>>> solution..
>>>>>> What your asking for can be considered 'Federated Search'
>>>>>> http://en.wikipedia.org/wiki/Federated_search
>>>>>>
>>>>>> And it can be conceived in as many ways as you have document 
>>>>>> types. Any
>>>>>> answer will probably end up
>>>>>> customized and weighted by your document silo value, usually 
>>>>>> companies
>>>>>> weight those by business rules
>>>>>> rather than head down the path of federated search, as it's just
>>>>>>         
>>>>> quicker
>>>>>      
>>>>>> and
>>>>>> cheaper, and you can accomplish more.
>>>>>> e.g
>>>>>> Medication = score *2  (as higher advertising incentives)
>>>>>> Diseases = score
>>>>>> Books = score * 0.75  ( thousands of books, which nobody buys etc..)
>>>>>>
>>>>>> You might also want to try consolidating your data into 1 schema,
and
>>>>>> consider layering or collapsing results
>>>>>> based on type.
>>>>>>
>>>>>> P
>>>>>>
>>>>>> On Fri, Apr 17, 2009 at 10:39 AM, theDude_2 <aornstein@webmd.net>
>>>>>>         
>>>>> wrote:
>>>>>      
>>>>>>> (bump) - any thoughts?
>>>>>>> ----
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> theDude_2 wrote:
>>>>>>>          
>>>>>>>> hi!
>>>>>>>>
>>>>>>>> I am trying to do something a little unique...
>>>>>>>>
>>>>>>>> I have a 90k text documents that I am trying to search
>>>>>>>> Search A: indexes and searches the documents using regular

>>>>>>>> relevancy
>>>>>>>> search
>>>>>>>> Search B: indexes and searches the documents using a smaller
subset
>>>>>>>>             
>>>>> of
>>>>>      
>>>>>>>> "key" words that I have chosen
>>>>>>>>
>>>>>>>> This gives me 2 seperate scores: Score A, and Score B...
>>>>>>>>
>>>>>>>> I am trying to show the top 10 results of the scores combined

>>>>>>>> so....
>>>>>>>>
>>>>>>>> FinalScoretextDoc = (scoreA_of_td1 * 0.5) * (scoreB_of_td1
* 0.5)
>>>>>>>>
>>>>>>>> While it seems straightforward, I do not want to calculate
the
>>>>>>>>             
>>>>> scores
>>>>>      
>>>>>>> of
>>>>>>>          
>>>>>>>> all the documents outside of lucene.  How can I integrate
this
>>>>>>>>             
>>>>> better
>>>>>      
>>>>>>> into
>>>>>>>          
>>>>>>>> the lucene search engine?  Is this possible to do by any
simple
>>>>>>>>             
>>>>> means?
>>>>>      
>>>>>>>> Thanks guys + gals!
>>>>>>>>
>>>>>>>>
>>>>>>>>             
>>>>>>> -- 
>>>>>>> View this message in context:
>>>>>>>
>>>>>>>           
>>>>> http://www.nabble.com/A-Challenge%21%3A-Combining-2-searches-into-a-single-resultset--tp23085506p23098961.html

>>>>>
>>>>>      
>>>>>>> Sent from the Lucene - Java Users mailing list archive at 
>>>>>>> Nabble.com.
>>>>>>>
>>>>>>>
>>>>>>> ---------------------------------------------------------------------

>>>>>>>
>>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>>>
>>>>>>>
>>>>>>>           
>>>>>>         
>>>>> -- 
>>>>> View this message in context:
>>>>> http://www.nabble.com/A-Challenge%21%3A-Combining-2-searches-into-a-single-resultset--tp23085506p23099744.html

>>>>>
>>>>> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>
>>>>>
>>>>>       
>>>>     
>>>
>>>   
>>
>>
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/A-Challenge%21%3A-Combining-2-searches-into-a-single-resultset--tp23085506p23137164.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message