Return-Path: Delivered-To: apmail-jakarta-lucene-user-archive@www.apache.org Received: (qmail 77371 invoked from network); 18 Feb 2004 14:38:40 -0000 Received: from daedalus.apache.org (HELO mail.apache.org) (208.185.179.12) by minotaur-2.apache.org with SMTP; 18 Feb 2004 14:38:40 -0000 Received: (qmail 82402 invoked by uid 500); 18 Feb 2004 14:38:31 -0000 Delivered-To: apmail-jakarta-lucene-user-archive@jakarta.apache.org Received: (qmail 82318 invoked by uid 500); 18 Feb 2004 14:38:30 -0000 Mailing-List: contact lucene-user-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Users List" Reply-To: "Lucene Users List" Delivered-To: mailing list lucene-user@jakarta.apache.org Received: (qmail 82305 invoked from network); 18 Feb 2004 14:38:30 -0000 Received: from unknown (HELO jive.servlets.net) (209.162.192.250) by daedalus.apache.org with SMTP; 18 Feb 2004 14:38:30 -0000 Received: from jivesoftware.com (CPE00045ade8fc4-CM024370005574.cpe.net.cable.rogers.com [65.50.38.86]) (authenticated bits=0) by jive.servlets.net (8.12.8/8.12.8) with ESMTP id i1IEcTBv004188 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK) for ; Wed, 18 Feb 2004 06:38:31 -0800 Message-ID: <403378EA.6000101@jivesoftware.com> Date: Wed, 18 Feb 2004 09:38:34 -0500 From: Bruce Ritchie Organization: Jive Software User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.6) Gecko/20040113 X-Accept-Language: en MIME-Version: 1.0 To: Lucene Users List Subject: Re: MoreLikeThis Query generator - Re: code for "more like this" query "expansion" - was - Re: setMaxClauseCount ?? References: <20040212104313.51470.qmail@web12707.mail.yahoo.com> <402BDC58.2010502@tropo.com> <40325AB6.8020002@apache.org> <403322AB.2050105@tropo.com> In-Reply-To: <403322AB.2050105@tropo.com> Content-Type: multipart/signed; protocol="application/x-pkcs7-signature"; micalg=sha1; boundary="------------ms090801090109090701070502" X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N X-Spam-Rating: minotaur-2.apache.org 1.6.2 0/1000/N --------------ms090801090109090701070502 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit David Spencer wrote: > [c] "interesting words" - uses code from MoreLikeThis to give a table of > all interesting > words in the current "source" doc ordered by score. > Remember score is idf*tf as per Dougs mail (and as per my > hopefully correct understanding of these things). This page is of course > more of a debugging > tool that something a normal user would see. One possible area of > improvement that jumped out at me after reviewing this table is using > stemming, say, allowing more words in the generated query when 2 words > have the same stem. Actually, the analyzer should do that, shouldn't it? For example, I have stemming analyzers for a variety of languages that both stem and remove stop words - it seems silly to me to duplicate that functionality when it's so easily provided by the analyzer. Given that, I would suggest removing the stop word functionality from this class as it is not needed and only confuses things. Regards, Bruce Ritchie http://www.jivesoftware.com/ --------------ms090801090109090701070502 Content-Type: application/x-pkcs7-signature; name="smime.p7s" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="smime.p7s" Content-Description: S/MIME Cryptographic Signature MIAGCSqGSIb3DQEHAqCAMIACAQExCzAJBgUrDgMCGgUAMIAGCSqGSIb3DQEHAQAAoIII/TCC AtkwggJCoAMCAQICAwrLgDANBgkqhkiG9w0BAQQFADBiMQswCQYDVQQGEwJaQTElMCMGA1UE ChMcVGhhd3RlIENvbnN1bHRpbmcgKFB0eSkgTHRkLjEsMCoGA1UEAxMjVGhhd3RlIFBlcnNv bmFsIEZyZWVtYWlsIElzc3VpbmcgQ0EwHhcNMDMwOTIzMTkzNjM3WhcNMDQwOTIyMTkzNjM3 WjBIMR8wHQYDVQQDExZUaGF3dGUgRnJlZW1haWwgTWVtYmVyMSUwIwYJKoZIhvcNAQkBFhZi cnVjZUBqaXZlc29mdHdhcmUuY29tMIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEA ozJPHo8aUH2MrrLnFrmybY38uC8nlm26vBAPgGhSMthJlA0ShY9zAs4qGPyX1fIcANCKw0yP QDESRRySYPcRYMBnwaeqtIwldTb1GTA6DAvZ/jpkM16DbAId17O86+4NeV7WoR+IguiyEUJq IqecxdGiJsLi8xNM0rPan91sWTA+30kPvW12IU6MIa4XInPUazLqzZmA3CHoQJ2qorYCUx1c P3VW2fbWMjpfis67MalWcFuBtThFuq5qsZMBEkFekoKPu2cpmjlKddrGj9+54U1CWBVQMl+9 HSB1/qhl1La8BYRhaZ5UG3c0za9yultXL9k1GxfZYVqDy2iMI5+ejQIDAQABozMwMTAhBgNV HREEGjAYgRZicnVjZUBqaXZlc29mdHdhcmUuY29tMAwGA1UdEwEB/wQCMAAwDQYJKoZIhvcN AQEEBQADgYEAEwi3SlYetSCzK5DLkClPMgTOM+P0qrIAFt+ar8uMZwr3xqFOxZlt6/qMHoXA aOEWhZkHJ2dq8ebxjSdGXQF6/jKB42CEmYXmP6umb/2DBE6bVTy9aksnMnKeSyuqP1HWwlCf 5+Vk811biLueAupunISEh7P0bQXUgVtVVbOtAfIwggLZMIICQqADAgECAgMKy4AwDQYJKoZI hvcNAQEEBQAwYjELMAkGA1UEBhMCWkExJTAjBgNVBAoTHFRoYXd0ZSBDb25zdWx0aW5nIChQ dHkpIEx0ZC4xLDAqBgNVBAMTI1RoYXd0ZSBQZXJzb25hbCBGcmVlbWFpbCBJc3N1aW5nIENB MB4XDTAzMDkyMzE5MzYzN1oXDTA0MDkyMjE5MzYzN1owSDEfMB0GA1UEAxMWVGhhd3RlIEZy ZWVtYWlsIE1lbWJlcjElMCMGCSqGSIb3DQEJARYWYnJ1Y2VAaml2ZXNvZnR3YXJlLmNvbTCC ASIwDQYJKoZIhvcNAQEBBQADggEPADCCAQoCggEBAKMyTx6PGlB9jK6y5xa5sm2N/LgvJ5Zt urwQD4BoUjLYSZQNEoWPcwLOKhj8l9XyHADQisNMj0AxEkUckmD3EWDAZ8GnqrSMJXU29Rkw OgwL2f46ZDNeg2wCHdezvOvuDXle1qEfiILoshFCaiKnnMXRoibC4vMTTNKz2p/dbFkwPt9J D71tdiFOjCGuFyJz1Gsy6s2ZgNwh6ECdqqK2AlMdXD91Vtn21jI6X4rOuzGpVnBbgbU4Rbqu arGTARJBXpKCj7tnKZo5SnXaxo/fueFNQlgVUDJfvR0gdf6oZdS2vAWEYWmeVBt3NM2vcrpb Vy/ZNRsX2WFag8tojCOfno0CAwEAAaMzMDEwIQYDVR0RBBowGIEWYnJ1Y2VAaml2ZXNvZnR3 YXJlLmNvbTAMBgNVHRMBAf8EAjAAMA0GCSqGSIb3DQEBBAUAA4GBABMIt0pWHrUgsyuQy5Ap TzIEzjPj9KqyABbfmq/LjGcK98ahTsWZbev6jB6FwGjhFoWZBydnavHm8Y0nRl0Bev4ygeNg hJmF5j+rpm/9gwROm1U8vWpLJzJynksrqj9R1sJQn+flZPNdW4i7ngLqbpyEhIez9G0F1IFb VVWzrQHyMIIDPzCCAqigAwIBAgIBDTANBgkqhkiG9w0BAQUFADCB0TELMAkGA1UEBhMCWkEx FTATBgNVBAgTDFdlc3Rlcm4gQ2FwZTESMBAGA1UEBxMJQ2FwZSBUb3duMRowGAYDVQQKExFU aGF3dGUgQ29uc3VsdGluZzEoMCYGA1UECxMfQ2VydGlmaWNhdGlvbiBTZXJ2aWNlcyBEaXZp c2lvbjEkMCIGA1UEAxMbVGhhd3RlIFBlcnNvbmFsIEZyZWVtYWlsIENBMSswKQYJKoZIhvcN AQkBFhxwZXJzb25hbC1mcmVlbWFpbEB0aGF3dGUuY29tMB4XDTAzMDcxNzAwMDAwMFoXDTEz MDcxNjIzNTk1OVowYjELMAkGA1UEBhMCWkExJTAjBgNVBAoTHFRoYXd0ZSBDb25zdWx0aW5n IChQdHkpIEx0ZC4xLDAqBgNVBAMTI1RoYXd0ZSBQZXJzb25hbCBGcmVlbWFpbCBJc3N1aW5n IENBMIGfMA0GCSqGSIb3DQEBAQUAA4GNADCBiQKBgQDEpjxVc1X7TrnKmVoeaMB1BHCd3+n/ ox7svc31W/Iadr1/DDph8r9RzgHU5VAKMNcCY1osiRVwjt3J8CuFWqo/cVbLrzwLB+fxH5E2 JCoTzyvV84J3PQO+K/67GD4Hv0CAAmTXp6a7n2XRxSpUhQ9IBH+nttE8YQRAHmQZcmC3+wID AQABo4GUMIGRMBIGA1UdEwEB/wQIMAYBAf8CAQAwQwYDVR0fBDwwOjA4oDagNIYyaHR0cDov L2NybC50aGF3dGUuY29tL1RoYXd0ZVBlcnNvbmFsRnJlZW1haWxDQS5jcmwwCwYDVR0PBAQD AgEGMCkGA1UdEQQiMCCkHjAcMRowGAYDVQQDExFQcml2YXRlTGFiZWwyLTEzODANBgkqhkiG 9w0BAQUFAAOBgQBIjNFQg+oLLswNo2asZw9/r6y+whehQ5aUnX9MIbj4Nh+qLZ82L8D0HFAg k3A8/a3hYWLD2ToZfoSxmRsAxRoLgnSeJVCUYsfbJ3FXJY3dqZw5jowgT2Vfldr394fWxghO rvbqNOUQGls1TXfjViF4gtwhGTXeJLHTHUb/XV9lTzGCAzswggM3AgEBMGkwYjELMAkGA1UE BhMCWkExJTAjBgNVBAoTHFRoYXd0ZSBDb25zdWx0aW5nIChQdHkpIEx0ZC4xLDAqBgNVBAMT I1RoYXd0ZSBQZXJzb25hbCBGcmVlbWFpbCBJc3N1aW5nIENBAgMKy4AwCQYFKw4DAhoFAKCC AacwGAYJKoZIhvcNAQkDMQsGCSqGSIb3DQEHATAcBgkqhkiG9w0BCQUxDxcNMDQwMjE4MTQz ODM0WjAjBgkqhkiG9w0BCQQxFgQUvqDUa8RvOHwg9MHt6S3u53FYXeIwUgYJKoZIhvcNAQkP MUUwQzAKBggqhkiG9w0DBzAOBggqhkiG9w0DAgICAIAwDQYIKoZIhvcNAwICAUAwBwYFKw4D AgcwDQYIKoZIhvcNAwICASgweAYJKwYBBAGCNxAEMWswaTBiMQswCQYDVQQGEwJaQTElMCMG A1UEChMcVGhhd3RlIENvbnN1bHRpbmcgKFB0eSkgTHRkLjEsMCoGA1UEAxMjVGhhd3RlIFBl cnNvbmFsIEZyZWVtYWlsIElzc3VpbmcgQ0ECAwrLgDB6BgsqhkiG9w0BCRACCzFroGkwYjEL MAkGA1UEBhMCWkExJTAjBgNVBAoTHFRoYXd0ZSBDb25zdWx0aW5nIChQdHkpIEx0ZC4xLDAq BgNVBAMTI1RoYXd0ZSBQZXJzb25hbCBGcmVlbWFpbCBJc3N1aW5nIENBAgMKy4AwDQYJKoZI hvcNAQEBBQAEggEAfqYOgSM53egYlZ/rrGTo8j+obQoi7UsTmPne+K/s566Mb8O7nzdv86kp 9nF8wRMQkEZZ6+sCK+n/QFBjamxiUsOvGV47w/9K8As6IG8pyzF5Y1l6VzzZizCcVZ/TMu71 J8EZM2nSxMDaUtDm+q8fSFmFRASVeaZ9wYv5tDx8iWTgbwfjBTtg+1TGpm9OsedsE4DrKyvA U0ZvYpGeliiFtvKcS/5VxvlPBvWMGxHpbRqb8UESir59G+v6Drw6Z2/sdD4JVirX7gQjavUt hvufpiB6jZgSGb/hx0lnu0jseiePlw1EaB+8Ovk9jHoLFKGfJP5ksVTV5419iT53quN3IQAA AAAAAA== --------------ms090801090109090701070502--