lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From theDude_2 <aornst...@webmd.net>
Subject Re: A Challenge!: Combining 2 searches into a single resultset?
Date Fri, 17 Apr 2009 15:54:24 GMT

Ah, Interesting... I didnt think of that!  I will try it and report back


pjaol wrote:
> 
> Why not put the keywords into the same document as another field? and
> search
> both fields
> at once, you can then use lucene syntax to give a boosting to the keyword
> fields.
> e.g.
> body:A good game last night by the redskins
> keyword: redskins
> 
> Query= body:(game OR redskins) keyword:(game OR redskins)^10
> 
> And adjust the boosting until you're happy.
> Check out for querying multiple fields
> http://wiki.apache.org/lucene-java/LuceneFAQ#head-300f0756fdaa71f522c96a868351f716573f2d77
> 
> You might even want to consider Solr and it's dismax search component
> http://wiki.apache.org/solr/DisMaxRequestHandler
> to make it easier
> 
> 
> 
> 
> On Fri, Apr 17, 2009 at 11:19 AM, theDude_2 <aornstein@webmd.net> wrote:
> 
>>
>> I appreciate your response, and read the wiki article concerning the
>> Federated search
>> and
>>
>> I'm not sure that my project falls into the "Federated Search" bucket...
>>
>> What I've done is created 2 indexes created with the same documents.
>> One index, contains the full documents - great for pure relevancy search
>> The second index: contains all of the same documents, but a small subset
>> of
>> each documents contents - only allowing words to be indexed that we deem
>> as
>> "good words" -
>>
>> (for example) if this was a football article database
>> Index 1: would index 100% of the article about the Redskins and the New
>> York
>> Giants
>> Index 2: would index the same article by only the "good words" in the
>> document like Redskins, Giants, Quarterback, Linebacker, etc.
>>
>> What I'm trying to do, if it's even possible! is run the search on both
>> indexes containing references to the same article, and multiple the
>> scores
>> together to get a final score that would represent something like a
>> "relative AND good word" score....
>>
>> Figuring that if a user searches on "Who is the Quarterback for the
>> Giants"
>> this will get the user an article that is both related to the query, and
>> deemed "important" to the query...
>>
>> I will look further into federated search and related items, but I think
>> that lucene probably wont be able to help me with this, am I right?
>>
>>
>>
>>
>>
>>
>>
>>
>> ------------
>>
>> pjaol wrote:
>> >
>> > I'd start by doing some research on the question rather than asking for
>> a
>> > solution..
>> > What your asking for can be considered 'Federated Search'
>> > http://en.wikipedia.org/wiki/Federated_search
>> >
>> > And it can be conceived in as many ways as you have document types. Any
>> > answer will probably end up
>> > customized and weighted by your document silo value, usually companies
>> > weight those by business rules
>> > rather than head down the path of federated search, as it's just
>> quicker
>> > and
>> > cheaper, and you can accomplish more.
>> > e.g
>> > Medication = score *2  (as higher advertising incentives)
>> > Diseases = score
>> > Books = score * 0.75  ( thousands of books, which nobody buys etc..)
>> >
>> > You might also want to try consolidating your data into 1 schema, and
>> > consider layering or collapsing results
>> > based on type.
>> >
>> > P
>> >
>> > On Fri, Apr 17, 2009 at 10:39 AM, theDude_2 <aornstein@webmd.net>
>> wrote:
>> >
>> >>
>> >> (bump) - any thoughts?
>> >> ----
>> >>
>> >>
>> >>
>> >> theDude_2 wrote:
>> >> >
>> >> > hi!
>> >> >
>> >> > I am trying to do something a little unique...
>> >> >
>> >> > I have a 90k text documents that I am trying to search
>> >> > Search A: indexes and searches the documents using regular relevancy
>> >> > search
>> >> > Search B: indexes and searches the documents using a smaller subset
>> of
>> >> > "key" words that I have chosen
>> >> >
>> >> > This gives me 2 seperate scores: Score A, and Score B...
>> >> >
>> >> > I am trying to show the top 10 results of the scores combined so....
>> >> >
>> >> > FinalScoretextDoc = (scoreA_of_td1 * 0.5) * (scoreB_of_td1 * 0.5)
>> >> >
>> >> > While it seems straightforward, I do not want to calculate the
>> scores
>> >> of
>> >> > all the documents outside of lucene.  How can I integrate this
>> better
>> >> into
>> >> > the lucene search engine?  Is this possible to do by any simple
>> means?
>> >> >
>> >> > Thanks guys + gals!
>> >> >
>> >> >
>> >>
>> >> --
>> >> View this message in context:
>> >>
>> http://www.nabble.com/A-Challenge%21%3A-Combining-2-searches-into-a-single-resultset--tp23085506p23098961.html
>> >> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>> >>
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> >> For additional commands, e-mail: java-user-help@lucene.apache.org
>> >>
>> >>
>> >
>> >
>>
>> --
>> View this message in context:
>> http://www.nabble.com/A-Challenge%21%3A-Combining-2-searches-into-a-single-resultset--tp23085506p23099744.html
>> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
> 
> 

-- 
View this message in context: http://www.nabble.com/A-Challenge%21%3A-Combining-2-searches-into-a-single-resultset--tp23085506p23100423.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message