Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 91573 invoked from network); 15 Jun 2007 20:34:01 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 15 Jun 2007 20:34:01 -0000 Received: (qmail 7842 invoked by uid 500); 15 Jun 2007 20:33:58 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 7808 invoked by uid 500); 15 Jun 2007 20:33:58 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 7797 invoked by uid 99); 15 Jun 2007 20:33:58 -0000 Received: from herse.apache.org (HELO herse.apache.org) (140.211.11.133) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 15 Jun 2007 13:33:58 -0700 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received-SPF: pass (herse.apache.org: local policy) Received: from [82.146.123.125] (HELO villon.taktik.be) (82.146.123.125) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 15 Jun 2007 13:33:53 -0700 Received: from localhost (localhost [127.0.0.1]) by villon.taktik.be (Postfix) with ESMTP id 5440E2809069 for ; Fri, 15 Jun 2007 22:33:31 +0200 (CEST) Received: from villon.taktik.be ([127.0.0.1]) by localhost (villon.taktik.be [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 01062-05 for ; Fri, 15 Jun 2007 22:33:25 +0200 (CEST) Received: from [192.168.1.3] (254.200-241-81.adsl-dyn.isp.belgacom.be [81.241.200.254]) by villon.taktik.be (Postfix) with ESMTP id A32D52809043 for ; Fri, 15 Jun 2007 22:33:25 +0200 (CEST) Mime-Version: 1.0 (Apple Message framework v752.3) In-Reply-To: <0C836030-35BF-4928-BA70-B90E399EC4F7@garambrogne.net> References: <0C836030-35BF-4928-BA70-B90E399EC4F7@garambrogne.net> Content-Type: text/plain; charset=ISO-8859-1; delsp=yes; format=flowed Message-Id: <913D56C9-6C69-402A-8FFD-2ABB6C412B22@taktik.be> Content-Transfer-Encoding: quoted-printable From: Antoine Baudoux Subject: Re: Several questions about scoring/sorting + random sorting in an image/related application Date: Fri, 15 Jun 2007 22:33:22 +0200 To: java-user@lucene.apache.org X-Mailer: Apple Mail (2.752.3) X-Virus-Scanned: by amavisd-new at taktik.be X-Virus-Checked: Checked by ClamAV on apache.org Well maybe i didnt explain my problem very well. I have a database =20 with over 3 million images, with each image belonging to one out of =20 300 possible collections. A query could return more than 100.000 =20 images (for example if they search for a popular image keyword). I want to sort my result with a combination of image date/collection =20 scoring : Recent images with high-score collections come first, then =20 recent image with lower collection. As the user navigate through the =20 images, less recent images start to appear, also sorted by collection =20= score. So imagine i make a query for "nature". This query would return =20 100.000 images. Your sugestion is that i dont make any attempt to =20 sort the images in Lucene.i just make a query with no sort. I would =20 then need to load the 100.000 rows from my database, then sort those =20 100.000 rows with my custom-defined ordering. Are-you sure this =20 method would be faster than custom Query, or a ValueSource query? =20 Since lucene already indexes and caches field values, I'm not very sure. > Walt explain differently what I said. > Lucene can be efficiently use for selecting objects, without =20 > sorting or scoring anything, then, with id stored in Lucene, you =20 > can sort yourself with a simple Sortable implementation. > The only limit is that lucene gives you not too much results, with =20 > your 300 maximal responses, you can play with it easily. > > M. > Le 15 juin 07 =E0 19:07, Walt Stoneburner a =E9crit : > >> Antoine Baudoux writes: >>> I want to be able to give a score to each collection. >> >> Keep in mind, Lucene is computing a score based on quite a number of >> things from how often a term is used in a document, how often it >> appears in the collection of documents, how long the query is, etc. >> >> If your concept of a document's score changes, then I'd be =20 >> inclined to >> think you're possibly using Lucene in a manner it wasn't designed =20 >> for. >> That said, I have two thoughts. >> >> THOUGHT ONE >> Use Lucene to locate "records" for you --- what you really are >> interested in getting back _from Lucene_ is the primary key. Then, >> use this key to do a lookup in your database of the score of the day >> and sort accordingly. The idea is that Lucene finds, your table >> scores, and because of that you won't need to re-index when something >> changes. >> >> THOUGHT TWO >> Use boosting. COLLECTION_ONE^5 COLLECTION_THREE^10 etc. That way >> /if/ the Lucene document appears in the collection, it's score is >> weighted according to your preferences. You're free to change the >> boosts on a query-by-query basis without having to re-index. >> >> >>> I can use a Very big ... query ... I am afraid that it will be slow. >> Try it. I think you'll find Lucene is _fast_. We do some pretty =20 >> HUGE >> and complicated queries and Lucene just screams. >> >> >>> I can add another field to each document, containing a computed >>> custom score, then i could sort on that field. But i want to avoid >>> this solution at all costs, since it would mean re-indexing all the >>> documents each time the collection scores change. >> Or, use indirection - instead of keeping the score, keep the primary >> key of a score table. Then in a database, where speed won't be the >> issue, perform the look up. Honestly, if you're only got 300 >> categories, you could keep that simple table in memory using less >> space than a small text file. >> >> >> >>> I would also like to implement random-sorting. ... Is it a good =20 >>> solution? >>> Is there another way to do it? >> >> This really, really, really feels like you're force fitting Lucene to >> do some business logic piece of a larger application. May I be so >> bold as to ask what's the _actual_ problem you're trying to solve. >> ("I'm trying to make a hole in a piece of oak" as opposed to "What's >> the best way to sharpen a Phillips screwdriver enough to cut wood?") >> >> Keep in mind that the forum is for Lucene, so parts of your questions >> may be answered outside of the forum. >> >> -wls >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >> For additional commands, e-mail: java-user-help@lucene.apache.org >> >> > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org