lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matthew Hall <mh...@informatics.jax.org>
Subject Re: A Challenge!: Combining 2 searches into a single resultset?
Date Fri, 17 Apr 2009 16:47:07 GMT
Erm, I likely should have mentioned that this technique requires the use 
of a MultiFieldQueryParser.

Matt

Matthew Hall wrote:
> If you can build an analyzer that tokenizes the second field so that 
> it filters out the words you don't want, you can then take advantage 
> of more intelligent queries as well.
>
>
> So for the example that pjaol wrote, the query would become something 
> like this:
>
> Query= body:(game OR redskins) keyword:(redskins)^10
>
> Depending on your corpus this may or may not be possible, the 
> determining factor being whether or not the list of words being 
> removed from each document to create the second field varies.
>
> (more specifically I mean, in some documents you remove the word game, 
> and in others you don't, if this is the case this technique won't work 
> for you.)
>
> Matt
>
>
>
> theDude_2 wrote:
>> Ah, Interesting... I didnt think of that!  I will try it and report back
>>
>>
>> pjaol wrote:
>>  
>>> Why not put the keywords into the same document as another field? and
>>> search
>>> both fields
>>> at once, you can then use lucene syntax to give a boosting to the 
>>> keyword
>>> fields.
>>> e.g.
>>> body:A good game last night by the redskins
>>> keyword: redskins
>>>
>>> Query= body:(game OR redskins) keyword:(game OR redskins)^10
>>>
>>> And adjust the boosting until you're happy.
>>> Check out for querying multiple fields
>>> http://wiki.apache.org/lucene-java/LuceneFAQ#head-300f0756fdaa71f522c96a868351f716573f2d77

>>>
>>>
>>> You might even want to consider Solr and it's dismax search component
>>> http://wiki.apache.org/solr/DisMaxRequestHandler
>>> to make it easier
>>>
>>>
>>>
>>>
>>> On Fri, Apr 17, 2009 at 11:19 AM, theDude_2 <aornstein@webmd.net> 
>>> wrote:
>>>
>>>    
>>>> I appreciate your response, and read the wiki article concerning the
>>>> Federated search
>>>> and
>>>>
>>>> I'm not sure that my project falls into the "Federated Search" 
>>>> bucket...
>>>>
>>>> What I've done is created 2 indexes created with the same documents.
>>>> One index, contains the full documents - great for pure relevancy 
>>>> search
>>>> The second index: contains all of the same documents, but a small 
>>>> subset
>>>> of
>>>> each documents contents - only allowing words to be indexed that we 
>>>> deem
>>>> as
>>>> "good words" -
>>>>
>>>> (for example) if this was a football article database
>>>> Index 1: would index 100% of the article about the Redskins and the 
>>>> New
>>>> York
>>>> Giants
>>>> Index 2: would index the same article by only the "good words" in the
>>>> document like Redskins, Giants, Quarterback, Linebacker, etc.
>>>>
>>>> What I'm trying to do, if it's even possible! is run the search on 
>>>> both
>>>> indexes containing references to the same article, and multiple the
>>>> scores
>>>> together to get a final score that would represent something like a
>>>> "relative AND good word" score....
>>>>
>>>> Figuring that if a user searches on "Who is the Quarterback for the
>>>> Giants"
>>>> this will get the user an article that is both related to the 
>>>> query, and
>>>> deemed "important" to the query...
>>>>
>>>> I will look further into federated search and related items, but I 
>>>> think
>>>> that lucene probably wont be able to help me with this, am I right?
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> ------------
>>>>
>>>> pjaol wrote:
>>>>      
>>>>> I'd start by doing some research on the question rather than 
>>>>> asking for
>>>>>         
>>>> a
>>>>      
>>>>> solution..
>>>>> What your asking for can be considered 'Federated Search'
>>>>> http://en.wikipedia.org/wiki/Federated_search
>>>>>
>>>>> And it can be conceived in as many ways as you have document 
>>>>> types. Any
>>>>> answer will probably end up
>>>>> customized and weighted by your document silo value, usually 
>>>>> companies
>>>>> weight those by business rules
>>>>> rather than head down the path of federated search, as it's just
>>>>>         
>>>> quicker
>>>>      
>>>>> and
>>>>> cheaper, and you can accomplish more.
>>>>> e.g
>>>>> Medication = score *2  (as higher advertising incentives)
>>>>> Diseases = score
>>>>> Books = score * 0.75  ( thousands of books, which nobody buys etc..)
>>>>>
>>>>> You might also want to try consolidating your data into 1 schema, and
>>>>> consider layering or collapsing results
>>>>> based on type.
>>>>>
>>>>> P
>>>>>
>>>>> On Fri, Apr 17, 2009 at 10:39 AM, theDude_2 <aornstein@webmd.net>
>>>>>         
>>>> wrote:
>>>>      
>>>>>> (bump) - any thoughts?
>>>>>> ----
>>>>>>
>>>>>>
>>>>>>
>>>>>> theDude_2 wrote:
>>>>>>          
>>>>>>> hi!
>>>>>>>
>>>>>>> I am trying to do something a little unique...
>>>>>>>
>>>>>>> I have a 90k text documents that I am trying to search
>>>>>>> Search A: indexes and searches the documents using regular 
>>>>>>> relevancy
>>>>>>> search
>>>>>>> Search B: indexes and searches the documents using a smaller
subset
>>>>>>>             
>>>> of
>>>>      
>>>>>>> "key" words that I have chosen
>>>>>>>
>>>>>>> This gives me 2 seperate scores: Score A, and Score B...
>>>>>>>
>>>>>>> I am trying to show the top 10 results of the scores combined

>>>>>>> so....
>>>>>>>
>>>>>>> FinalScoretextDoc = (scoreA_of_td1 * 0.5) * (scoreB_of_td1 *
0.5)
>>>>>>>
>>>>>>> While it seems straightforward, I do not want to calculate the
>>>>>>>             
>>>> scores
>>>>      
>>>>>> of
>>>>>>          
>>>>>>> all the documents outside of lucene.  How can I integrate this
>>>>>>>             
>>>> better
>>>>      
>>>>>> into
>>>>>>          
>>>>>>> the lucene search engine?  Is this possible to do by any simple
>>>>>>>             
>>>> means?
>>>>      
>>>>>>> Thanks guys + gals!
>>>>>>>
>>>>>>>
>>>>>>>             
>>>>>> -- 
>>>>>> View this message in context:
>>>>>>
>>>>>>           
>>>> http://www.nabble.com/A-Challenge%21%3A-Combining-2-searches-into-a-single-resultset--tp23085506p23098961.html

>>>>
>>>>      
>>>>>> Sent from the Lucene - Java Users mailing list archive at 
>>>>>> Nabble.com.
>>>>>>
>>>>>>
>>>>>> ---------------------------------------------------------------------

>>>>>>
>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>>
>>>>>>
>>>>>>           
>>>>>         
>>>> -- 
>>>> View this message in context:
>>>> http://www.nabble.com/A-Challenge%21%3A-Combining-2-searches-into-a-single-resultset--tp23085506p23099744.html

>>>>
>>>> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>
>>>>
>>>>       
>>>     
>>
>>   
>
>



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message