lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <>
Subject Re: Multivalued scoring
Date Thu, 16 Dec 2010 00:04:22 GMT
As Ryan mentions, you really should consider piling them all into
a single index. Yes, it seems really wasteful to re-index author,
URL with every last photo, but try it and see if the size is acceptable.
Or, more accurately, whether performance is acceptable.


On Wed, Dec 15, 2010 at 11:04 AM, Dennis Hendriksen <> wrote:

> Hi,
> We are using a Lucene 3.x index to search for photo albums based on
> textual properties such as photo album title/author/URL and photo
> captions/URLs. Goal is to find the most relevant photo albums for a user
> query and display the best matching photos for these albums.
> In our current solution we append, for each photo album, all photo
> captions in a catchall field and do a search. For each search result we
> get the URL and do a look up in another Lucene index where each document
> contains individual photo properties to find the best matching photos in
> a photo album. These indexes are very small, but there is one for each
> photo album. (see example below)
> I've tried to use multivalued fields and exploit the fact that the
> values are ordered so that for example the 2nd value of field:caption is
> related to the 2nd value of field:URL.
> The problem with this solution is that it is not possible to find the
> best matching photos in the best photo albums (scoring of multivalued
> fields is performed on all values together). As a workaround I've tried
> storing some fields and creating the photo indexes on the fly using a
> RAMDirectory. Besides the fact that this is not an elegant solution, the
> increase in index size is unacceptable in our current setup.
> What would be a better way to solve this problem?
> Thanks for your time,
> Dennis
> ---
> Example:
> index: photo albums
> doc1( field:title, field:author, field:url_album, field:id=unique_id )
> doc2( field:title, field:author, field:url_album, field:id=unique_id )
> ...
> index: <unique_id>
> doc1(field:caption, field:url_photo)
> doc1(field:caption, field:url_photo)
> doc1(field:caption, field:url_photo)
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message