lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Antoine Baudoux>
Subject Re: Several questions about scoring/sorting + random sorting in an image/related application
Date Fri, 15 Jun 2007 20:33:22 GMT
Well maybe i didnt explain my problem very well. I have a database  
with over 3 million images, with each image belonging to one out of  
300 possible collections. A query could return more than 100.000  
images (for example if they search for a popular  image keyword).

I want to sort my result with a combination of image date/collection  
scoring : Recent images with high-score collections come first, then  
recent image with lower collection. As the user navigate through the  
images, less recent images start to appear, also sorted by collection  

So imagine i make a query for "nature". This query would return  
100.000 images. Your sugestion is that i dont make any attempt to  
sort the images in Lucene.i just make a query with no sort. I would  
then need to load the 100.000 rows from my database, then sort those  
100.000 rows with my custom-defined ordering. Are-you sure this  
method would be faster than custom Query, or a ValueSource query?  
Since lucene already indexes and caches field values, I'm not very sure.

> Walt explain differently what I said.
> Lucene can be efficiently use for selecting objects, without  
> sorting or scoring anything, then, with id stored in Lucene, you  
> can sort yourself with a simple Sortable implementation.
> The only limit is that lucene gives you not too much results, with  
> your 300 maximal responses, you can play with it easily.
> M.
> Le 15 juin 07 à 19:07, Walt Stoneburner a écrit :
>> Antoine Baudoux writes:
>>> I want to be able to give a score to each collection.
>> Keep in mind, Lucene is computing a score based on quite a number of
>> things from how often a term is used in a document, how often it
>> appears in the collection of documents, how long the query is, etc.
>> If your concept of a document's score changes, then I'd be  
>> inclined to
>> think you're possibly using Lucene in a manner it wasn't designed  
>> for.
>> That said, I have two thoughts.
>> Use Lucene to locate "records" for you --- what you really are
>> interested in getting back _from Lucene_ is the primary key.  Then,
>> use this key to do a lookup in your database of the score of the day
>> and sort accordingly.  The idea is that Lucene finds, your table
>> scores, and because of that you won't need to re-index when something
>> changes.
>> Use boosting.  COLLECTION_ONE^5  COLLECTION_THREE^10  etc.  That way
>> /if/ the Lucene document appears in the collection, it's score is
>> weighted according to your preferences.  You're free to change the
>> boosts on a query-by-query basis without having to re-index.
>>> I can use a Very big ... query ... I am afraid that it will be slow.
>> Try it.  I think you'll find Lucene is _fast_.  We do some pretty  
>> and complicated queries and Lucene just screams.
>>> I can add another field to each document, containing a computed
>>> custom score, then i could sort on that field. But i want to avoid
>>> this solution at all costs, since it would mean re-indexing all the
>>> documents each time the collection scores change.
>> Or, use indirection - instead of keeping the score, keep the primary
>> key of a score table.  Then in a database, where speed won't be the
>> issue, perform the look up.   Honestly, if you're only got 300
>> categories, you could keep that simple table in memory using less
>> space than a small text file.
>>> I would also like to implement random-sorting. ... Is it a good  
>>> solution?
>>> Is there another way to do it?
>> This really, really, really feels like you're force fitting Lucene to
>> do some business logic piece of a larger application.  May I be so
>> bold as to ask what's the _actual_ problem you're trying to solve.
>> ("I'm trying to make a hole in a piece of oak" as opposed to "What's
>> the best way to sharpen a Phillips screwdriver enough to cut wood?")
>> Keep in mind that the forum is for Lucene, so parts of your questions
>> may be answered outside of the forum.
>> -wls
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail:
>> For additional commands, e-mail:
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message