Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 7195 invoked from network); 9 Feb 2007 11:52:57 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 9 Feb 2007 11:52:57 -0000 Received: (qmail 10996 invoked by uid 500); 9 Feb 2007 11:52:58 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 10170 invoked by uid 500); 9 Feb 2007 11:52:56 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 10159 invoked by uid 99); 9 Feb 2007 11:52:56 -0000 Received: from herse.apache.org (HELO herse.apache.org) (140.211.11.133) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 09 Feb 2007 03:52:56 -0800 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received-SPF: pass (herse.apache.org: local policy includes SPF record at spf.trusted-forwarder.org) Received: from [217.146.177.33] (HELO web26011.mail.ukl.yahoo.com) (217.146.177.33) by apache.org (qpsmtpd/0.29) with SMTP; Fri, 09 Feb 2007 03:52:44 -0800 Received: (qmail 6045 invoked by uid 60001); 9 Feb 2007 11:52:23 -0000 DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.co.uk; h=X-YMail-OSG:Received:X-Mailer:Date:From:Subject:To:MIME-Version:Content-Type:Content-Transfer-Encoding:Message-ID; b=bpMqPWd0Eh8BoGaZwAsWYxO+dOHK4GfX5sgGLr4csc9Vs26d6bQDSNCoTy+Ay6SGo7aer1A16qlv6GBkZ/zZxZINUE8KonOYScoPV5Ymld8XSuHYFDZtGfotzrsvGr4ffXMKZtsO923QgMWkidDyk2i0DO7K7W3S3nEaa/MXa0g=; X-YMail-OSG: VEo.5aIVM1lL60sJWLO3abdawls8ciKwE5EiBmecVsCSsy7NqASBJ6GEoQhCmcvNEZW7kmAB9re6RUtpMNpHv1_2Dn.KDXlaAqRAx4gKFEsXup7WOkTvaRLF4MdBfmqTWXAowUdDoGacob_UxCaBtBEibiK86gGtvRAwhuIqkIdtdwD51iD4Yq11mw-- Received: from [193.36.230.96] by web26011.mail.ukl.yahoo.com via HTTP; Fri, 09 Feb 2007 11:52:23 GMT X-Mailer: YahooMailRC/368.7 YahooMailWebService/0.6.132.7 Date: Fri, 9 Feb 2007 11:52:23 +0000 (GMT) From: mark harwood Subject: Re: Reduction based "more like this"? To: java-user@lucene.apache.org MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable Message-ID: <181449.5358.qm@web26011.mail.ukl.yahoo.com> X-Virus-Checked: Checked by ClamAV on apache.org The distinguishing characteristics you mark out and put in a field may not = be so distinguishing as more content is added to an index (e.g. use of new = terminology like "podcast" becomes more prevalent). Maintaining/regeneratin= g this field in anything other than a static index then starts to look like= a non-trivial overhead.=0A=0AWhile we are musing on this, I'm not sure tha= t with things like MoreLikeThis (or the BooleanQuery scoring?) we have cons= idered the true value of *coincidences* of terms rather than independently = summing their individual IDFs. For example, given terms "female", "John" an= d "London" - all 3 may have equal IDF but should a document representing a = female in London be given equal weighting to a document representing the r= arer example of a female who happens to be called "John"? Considering these= pairings adds extra complexity/cost but might be an interesting avenue to = explore for some apps when selecting distinguishing characteristics or weig= hting query results.=0A=0ACheers=0AMark=0A=0A=0A=0A=0A----- Original Messag= e ----=0AFrom: karl wettin =0ATo: java-user@lucene.a= pache.org=0ASent: Friday, 9 February, 2007 8:31:05 AM=0ASubject: Reduction = based "more like this"?=0A=0AI just woke up thinking it would be cool to at= tempt reducing the data =0Aof all documents using PCA (or so) and store th= e output in a new =0Afield per dimention introduced in order to find simil= air documents by =0Aplacing a simple proximity query. Did anyone attempt s= omething like =0Athis?=0A=0AI did not think this through that much. Nor do= I need this feature. =0AJust think it would be a cool experiment.=0A=0A--= =0Akarl=0A=0A-------------------------------------------------------------= --------=0ATo unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org= =0AFor additional commands, e-mail: java-user-help@lucene.apache.org=0A=0A= =0A=0A=0A=0A=0A=09=09=0A___________________________________________________= ________ =0AInbox full of unwanted email? Get leading protection and 1GB st= orage with All New Yahoo! Mail. http://uk.docs.yahoo.com/nowyoucan.html --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org