lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doron Cohen <DOR...@il.ibm.com>
Subject Re: Integrate Lucene search facilities with existing databases
Date Thu, 24 May 2007 06:18:23 GMT

Huajing Li wrote:

> I am working on an application that must deal with ranking on highly
dynamic
> metadata. For example, suppose I want to provide ranking based on the
number
> of downloads of hit documents. A user may log-in to the system and send a
> query, which will be answered by Lucene in a traditional way. The
relevant
> documents will be scored and ranked, based on default Lucene scoring
> functions. In addition, the system wants to support users with the
> "popularity" ranking facility. The number of downloads for a document may
> continue to increase, even during a query. It will incur much overhead if
we
> put the "popularity" as a field in the Lucene index (delete and insert
the
> document when an update happens). Instead, we choose to store such
> information in a database, with document identifiers linking database
> records back to the index.
>
>
>
> This setting, however, creates a ranking problem. It is not efficient to
> send each hit document identifier to the database as a SQL query to
obtain
> the download popularity information. It will be good to have the
mechanism
> to link database records directly with Lucene indices, for which a query
can
> retrieve corresponding records from the database at the querying time. We
> are very interested to know if there is some open source toolkits or
> libraries to do the dirty-works. Of course, we also want to know other
> alternative solutions to meet our ends.

The problem seems to map to "function query" - Lucene-446.
With that patch, you could create a ValueSourceQuery, based on these
external
values, and then combine it into the score of another (any) query using
CustomScoreQuery. You can override the customScore() method to combine the
scores as fits your needs.

There is one catch though - that approach assumes that values obtained from
the ValueSource match Lucene's internal docids. I think this aspect is
going
to be an issue in any solution that combines external values into the
scoring
process. If your system does not delete documents (or if you are using the
patch from LUCENE-879 (which I didn't try yet)), and, you maintain the
order
by which docs are added, this may work for you.

HTH,
Doron


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message