lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matthew Hall <mh...@informatics.jax.org>
Subject Re: A Challenge!: Combining 2 searches into a single resultset?
Date Fri, 17 Apr 2009 16:45:06 GMT
If you can build an analyzer that tokenizes the second field so that it 
filters out the words you don't want, you can then take advantage of 
more intelligent queries as well.


So for the example that pjaol wrote, the query would become something 
like this:

Query= body:(game OR redskins) keyword:(redskins)^10

Depending on your corpus this may or may not be possible, the 
determining factor being whether or not the list of words being removed 
from each document to create the second field varies.

(more specifically I mean, in some documents you remove the word game, 
and in others you don't, if this is the case this technique won't work 
for you.)

Matt



theDude_2 wrote:
> Ah, Interesting... I didnt think of that!  I will try it and report back
>
>
> pjaol wrote:
>   
>> Why not put the keywords into the same document as another field? and
>> search
>> both fields
>> at once, you can then use lucene syntax to give a boosting to the keyword
>> fields.
>> e.g.
>> body:A good game last night by the redskins
>> keyword: redskins
>>
>> Query= body:(game OR redskins) keyword:(game OR redskins)^10
>>
>> And adjust the boosting until you're happy.
>> Check out for querying multiple fields
>> http://wiki.apache.org/lucene-java/LuceneFAQ#head-300f0756fdaa71f522c96a868351f716573f2d77
>>
>> You might even want to consider Solr and it's dismax search component
>> http://wiki.apache.org/solr/DisMaxRequestHandler
>> to make it easier
>>
>>
>>
>>
>> On Fri, Apr 17, 2009 at 11:19 AM, theDude_2 <aornstein@webmd.net> wrote:
>>
>>     
>>> I appreciate your response, and read the wiki article concerning the
>>> Federated search
>>> and
>>>
>>> I'm not sure that my project falls into the "Federated Search" bucket...
>>>
>>> What I've done is created 2 indexes created with the same documents.
>>> One index, contains the full documents - great for pure relevancy search
>>> The second index: contains all of the same documents, but a small subset
>>> of
>>> each documents contents - only allowing words to be indexed that we deem
>>> as
>>> "good words" -
>>>
>>> (for example) if this was a football article database
>>> Index 1: would index 100% of the article about the Redskins and the New
>>> York
>>> Giants
>>> Index 2: would index the same article by only the "good words" in the
>>> document like Redskins, Giants, Quarterback, Linebacker, etc.
>>>
>>> What I'm trying to do, if it's even possible! is run the search on both
>>> indexes containing references to the same article, and multiple the
>>> scores
>>> together to get a final score that would represent something like a
>>> "relative AND good word" score....
>>>
>>> Figuring that if a user searches on "Who is the Quarterback for the
>>> Giants"
>>> this will get the user an article that is both related to the query, and
>>> deemed "important" to the query...
>>>
>>> I will look further into federated search and related items, but I think
>>> that lucene probably wont be able to help me with this, am I right?
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> ------------
>>>
>>> pjaol wrote:
>>>       
>>>> I'd start by doing some research on the question rather than asking for
>>>>         
>>> a
>>>       
>>>> solution..
>>>> What your asking for can be considered 'Federated Search'
>>>> http://en.wikipedia.org/wiki/Federated_search
>>>>
>>>> And it can be conceived in as many ways as you have document types. Any
>>>> answer will probably end up
>>>> customized and weighted by your document silo value, usually companies
>>>> weight those by business rules
>>>> rather than head down the path of federated search, as it's just
>>>>         
>>> quicker
>>>       
>>>> and
>>>> cheaper, and you can accomplish more.
>>>> e.g
>>>> Medication = score *2  (as higher advertising incentives)
>>>> Diseases = score
>>>> Books = score * 0.75  ( thousands of books, which nobody buys etc..)
>>>>
>>>> You might also want to try consolidating your data into 1 schema, and
>>>> consider layering or collapsing results
>>>> based on type.
>>>>
>>>> P
>>>>
>>>> On Fri, Apr 17, 2009 at 10:39 AM, theDude_2 <aornstein@webmd.net>
>>>>         
>>> wrote:
>>>       
>>>>> (bump) - any thoughts?
>>>>> ----
>>>>>
>>>>>
>>>>>
>>>>> theDude_2 wrote:
>>>>>           
>>>>>> hi!
>>>>>>
>>>>>> I am trying to do something a little unique...
>>>>>>
>>>>>> I have a 90k text documents that I am trying to search
>>>>>> Search A: indexes and searches the documents using regular relevancy
>>>>>> search
>>>>>> Search B: indexes and searches the documents using a smaller subset
>>>>>>             
>>> of
>>>       
>>>>>> "key" words that I have chosen
>>>>>>
>>>>>> This gives me 2 seperate scores: Score A, and Score B...
>>>>>>
>>>>>> I am trying to show the top 10 results of the scores combined so....
>>>>>>
>>>>>> FinalScoretextDoc = (scoreA_of_td1 * 0.5) * (scoreB_of_td1 * 0.5)
>>>>>>
>>>>>> While it seems straightforward, I do not want to calculate the
>>>>>>             
>>> scores
>>>       
>>>>> of
>>>>>           
>>>>>> all the documents outside of lucene.  How can I integrate this
>>>>>>             
>>> better
>>>       
>>>>> into
>>>>>           
>>>>>> the lucene search engine?  Is this possible to do by any simple
>>>>>>             
>>> means?
>>>       
>>>>>> Thanks guys + gals!
>>>>>>
>>>>>>
>>>>>>             
>>>>> --
>>>>> View this message in context:
>>>>>
>>>>>           
>>> http://www.nabble.com/A-Challenge%21%3A-Combining-2-searches-into-a-single-resultset--tp23085506p23098961.html
>>>       
>>>>> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>
>>>>>
>>>>>           
>>>>         
>>> --
>>> View this message in context:
>>> http://www.nabble.com/A-Challenge%21%3A-Combining-2-searches-into-a-single-resultset--tp23085506p23099744.html
>>> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>>>       
>>     
>
>   


-- 
Matthew Hall
Software Engineer
Mouse Genome Informatics
mhall@informatics.jax.org
(207) 288-6012



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message