Return-Path: Mailing-List: contact lucene-dev-help@jakarta.apache.org; run by ezmlm Delivered-To: mailing list lucene-dev@jakarta.apache.org Received: (qmail 18145 invoked from network); 16 Sep 2003 17:54:15 -0000 Received: from unknown (HELO jive.servlets.net) (209.162.192.250) by daedalus.apache.org with SMTP; 16 Sep 2003 17:54:15 -0000 Received: from jivesoftware.com (CPE00045ade8fc4-CM008037bbf5fd.cpe.net.cable.rogers.com [24.153.2.131]) (authenticated bits=0) by jive.servlets.net (8.12.8/8.12.8) with ESMTP id h8GHsHek012453 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK) for ; Tue, 16 Sep 2003 10:54:19 -0700 Message-ID: <3F674E4A.3000601@jivesoftware.com> Date: Tue, 16 Sep 2003 13:54:18 -0400 From: Bruce Ritchie Organization: Jive Software User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.4.1) Gecko/20030906 X-Accept-Language: en-ca, en MIME-Version: 1.0 To: Lucene Developers List Subject: Re: Caching filter wrapper (was Re: RE : DateFilter.Before/After) References: <2AB73DBC-E796-11D7-9CE9-000393A564E6@ehatchersolutions.com> <3F65ECD3.7050302@jivesoftware.com> <3F65F349.7050100@lucene.com> <3F65F888.8010504@jivesoftware.com> <3F65FE67.7000008@lucene.com> <3F662C94.4090000@jivesoftware.com> <3F6630B7.6090606@lucene.com> <3F669070.3010706@jivesoftware.com> <3F6739BD.9000800@lucene.com> In-Reply-To: <3F6739BD.9000800@lucene.com> Content-Type: multipart/signed; protocol="application/x-pkcs7-signature"; micalg=sha1; boundary="------------ms050501080803080306080302" X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N --------------ms050501080803080306080302 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Doug Cutting wrote: >> for (int i = 0; i < numResults; i++) { >> ids[i] = Long.parseLong((hits.doc(i)).get("messageID")); >> } > > This is not a recommended way to use Lucene. The intent is that you > should only have to call Hits.doc() for documents that you actually > display, usually around 10 per query. Is this still a bottleneck when > you fetch a max of 10 or 20 documents? I didn't test this case. > So I'd be interested to hear why you need 1500 hits. My guess is that > you're doing post-processing of hits, then selecting 10 or so to > actually display. If you can figure out a way to do this post > processing without accessing the document object, i.e., through the > query, a custom HitCollector, or the SearchBean, then this optimization > is probably not needed. We would dearly love to not have to post-process results returned from lucene. Unfortunately, we can't foresee a way to do this given the current architecture of our applications and Lucene. The issue is that we must both exclude search results based upon an external (to lucene) permission system and be able to sort results based upon criteria(s) that again can't be stored inside lucene (document rating is an example). Neither the permissions nor the external sort criteria(s) can be stored in lucene because they can impact too many documents when they change (1 permission change could require 'updating' a field in every document in the lucene store) or change too often (it's quite probable that a document rating will change every time a document is viewed for example). The only way I foresee that we could internalize both of these factors into lucene is if it was possible to modify a document inside of lucene at basically no cost. Since that's not currently possible, we are stuck with retrieving all the documents from lucene and post-processing them. Even if updating a document was possible we might decide that it's just not worth it to store some document attributes in lucene from an overall performance perspective. There may of course be other possible solutions however we haven't yet thought of them > A 30% optimization to a slow algorithm is better than nothing, but it > would be better yet to improve the algorithm. That said, this sort of > improvement is not always trivial, and lots of people use Lucene in the > way that you have, so it's still may be worth optimizing this. 30% on my machine - I think it's likely to be quite a bit faster when the lucene files are stripped across multiple disks. I can't test that assumption though as I don't have the hardware available. I believe the speedup is beneficial in almost all situations and the cost associated with the optimization is quite minimal, especially when compared to the alternative (slow searches under heavy load or more memory usage/file descriptors through multiple readers). Regards, Bruce Ritchie --------------ms050501080803080306080302 Content-Type: application/x-pkcs7-signature; name="smime.p7s" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="smime.p7s" Content-Description: S/MIME Cryptographic Signature MIAGCSqGSIb3DQEHAqCAMIACAQExCzAJBgUrDgMCGgUAMIAGCSqGSIb3DQEHAQAAoIIJWDCC AwowggJzoAMCAQICAwht7zANBgkqhkiG9w0BAQQFADCBkjELMAkGA1UEBhMCWkExFTATBgNV BAgTDFdlc3Rlcm4gQ2FwZTESMBAGA1UEBxMJQ2FwZSBUb3duMQ8wDQYDVQQKEwZUaGF3dGUx HTAbBgNVBAsTFENlcnRpZmljYXRlIFNlcnZpY2VzMSgwJgYDVQQDEx9QZXJzb25hbCBGcmVl bWFpbCBSU0EgMjAwMC44LjMwMB4XDTAyMTAwNzIwNTMyOFoXDTAzMTAwNzIwNTMyOFowSDEf MB0GA1UEAxMWVGhhd3RlIEZyZWVtYWlsIE1lbWJlcjElMCMGCSqGSIb3DQEJARYWYnJ1Y2VA aml2ZXNvZnR3YXJlLmNvbTCCASIwDQYJKoZIhvcNAQEBBQADggEPADCCAQoCggEBAN1qo82c agU3foe/NdBpl0qE2gtfjKV21zQM6NjXgfi4saTNwMvPJCpoU4KXCQZC+sPOll23eo0dtugN q/70apl32G7TwckV24aL0MYbWZhr8s75k6OTfGS57w8J2T60Zi7BDEEUv8vjUG/eRYV8pSK1 3Acix+FPhN8sW+NFF0Hfe9z7brpJ/7BTsiHcEXwdqlR/XjPgcNpCbQ/QQJkf3Zmx3XYBfn7T 3fyhSm4uhPNkz941HTblz6+VBB/ICubucm+Pz5gSpHYGBm+J8jnimVMQEX1LL/wInN7vwutu sIMLoQ8C1YAQMUCnITEA38kg5ZHBwD0I/odURGXdKCjAdlECAwEAAaMzMDEwIQYDVR0RBBow GIEWYnJ1Y2VAaml2ZXNvZnR3YXJlLmNvbTAMBgNVHRMBAf8EAjAAMA0GCSqGSIb3DQEBBAUA A4GBAA+U/df/1mU5jG7Ksuc2wX+V/oUt6telqLGQNn/BdKJlsUTDFHvs2QUeaeWwmbNsbXiC JSowxA4lrUZaLdj8NZcjhdZhFqJpU/ksyem45qRYYHVMnWpoII7dSUmLQWntM70497rSUUD2 CSwJ4JfceLLDO0PKOzlCR+gmbYFTJzrdMIIDCjCCAnOgAwIBAgIDCG3vMA0GCSqGSIb3DQEB BAUAMIGSMQswCQYDVQQGEwJaQTEVMBMGA1UECBMMV2VzdGVybiBDYXBlMRIwEAYDVQQHEwlD YXBlIFRvd24xDzANBgNVBAoTBlRoYXd0ZTEdMBsGA1UECxMUQ2VydGlmaWNhdGUgU2Vydmlj ZXMxKDAmBgNVBAMTH1BlcnNvbmFsIEZyZWVtYWlsIFJTQSAyMDAwLjguMzAwHhcNMDIxMDA3 MjA1MzI4WhcNMDMxMDA3MjA1MzI4WjBIMR8wHQYDVQQDExZUaGF3dGUgRnJlZW1haWwgTWVt YmVyMSUwIwYJKoZIhvcNAQkBFhZicnVjZUBqaXZlc29mdHdhcmUuY29tMIIBIjANBgkqhkiG 9w0BAQEFAAOCAQ8AMIIBCgKCAQEA3WqjzZxqBTd+h7810GmXSoTaC1+MpXbXNAzo2NeB+Lix pM3Ay88kKmhTgpcJBkL6w86WXbd6jR226A2r/vRqmXfYbtPByRXbhovQxhtZmGvyzvmTo5N8 ZLnvDwnZPrRmLsEMQRS/y+NQb95FhXylIrXcByLH4U+E3yxb40UXQd973Ptuukn/sFOyIdwR fB2qVH9eM+Bw2kJtD9BAmR/dmbHddgF+ftPd/KFKbi6E82TP3jUdNuXPr5UEH8gK5u5yb4/P mBKkdgYGb4nyOeKZUxARfUsv/Aic3u/C626wgwuhDwLVgBAxQKchMQDfySDlkcHAPQj+h1RE Zd0oKMB2UQIDAQABozMwMTAhBgNVHREEGjAYgRZicnVjZUBqaXZlc29mdHdhcmUuY29tMAwG A1UdEwEB/wQCMAAwDQYJKoZIhvcNAQEEBQADgYEAD5T91//WZTmMbsqy5zbBf5X+hS3q16Wo sZA2f8F0omWxRMMUe+zZBR5p5bCZs2xteIIlKjDEDiWtRlot2Pw1lyOF1mEWomlT+SzJ6bjm pFhgdUydamggjt1JSYtBae0zvTj3utJRQPYJLAngl9x4ssM7Q8o7OUJH6CZtgVMnOt0wggM4 MIICoaADAgECAhBmRXK3zHT1z2N2RYTQLpEBMA0GCSqGSIb3DQEBBAUAMIHRMQswCQYDVQQG EwJaQTEVMBMGA1UECBMMV2VzdGVybiBDYXBlMRIwEAYDVQQHEwlDYXBlIFRvd24xGjAYBgNV BAoTEVRoYXd0ZSBDb25zdWx0aW5nMSgwJgYDVQQLEx9DZXJ0aWZpY2F0aW9uIFNlcnZpY2Vz IERpdmlzaW9uMSQwIgYDVQQDExtUaGF3dGUgUGVyc29uYWwgRnJlZW1haWwgQ0ExKzApBgkq hkiG9w0BCQEWHHBlcnNvbmFsLWZyZWVtYWlsQHRoYXd0ZS5jb20wHhcNMDAwODMwMDAwMDAw WhcNMDQwODI3MjM1OTU5WjCBkjELMAkGA1UEBhMCWkExFTATBgNVBAgTDFdlc3Rlcm4gQ2Fw ZTESMBAGA1UEBxMJQ2FwZSBUb3duMQ8wDQYDVQQKEwZUaGF3dGUxHTAbBgNVBAsTFENlcnRp ZmljYXRlIFNlcnZpY2VzMSgwJgYDVQQDEx9QZXJzb25hbCBGcmVlbWFpbCBSU0EgMjAwMC44 LjMwMIGfMA0GCSqGSIb3DQEBAQUAA4GNADCBiQKBgQDeMzKmY8cJJUU+0m54J2eBxdqIGYKX DuNEKYpjNSptcDz63K737nRvMLwzkH/5NHGgo22Y8cNPomXbDfpL8dbdYaX5hc1VmjUanZJ1 qCeu2HL5ugL217CR3hzpq+AYA6h8Q0JQUYeDPPA5tJtUihOH/7ObnUlmAC0JieyUa+mhaQID AQABo04wTDApBgNVHREEIjAgpB4wHDEaMBgGA1UEAxMRUHJpdmF0ZUxhYmVsMS0yOTcwEgYD VR0TAQH/BAgwBgEB/wIBADALBgNVHQ8EBAMCAQYwDQYJKoZIhvcNAQEEBQADgYEAMbFLR135 AXHl9VNsXXnWPZjAJhNigSKnEvgilegbSbcnewQ5uvzm8iTrkfq97A0qOPdQVahs9w2tTBu8 A/S166JHn2yiDFiNMUIJEWywGmnRKxKyQF1q+XnQ6i4l3Yrk/NsNH50C81rbyjz2ROomaYd/ SJ7OpZ/nhNjJYmKtBcYxggPVMIID0QIBATCBmjCBkjELMAkGA1UEBhMCWkExFTATBgNVBAgT DFdlc3Rlcm4gQ2FwZTESMBAGA1UEBxMJQ2FwZSBUb3duMQ8wDQYDVQQKEwZUaGF3dGUxHTAb BgNVBAsTFENlcnRpZmljYXRlIFNlcnZpY2VzMSgwJgYDVQQDEx9QZXJzb25hbCBGcmVlbWFp bCBSU0EgMjAwMC44LjMwAgMIbe8wCQYFKw4DAhoFAKCCAg8wGAYJKoZIhvcNAQkDMQsGCSqG SIb3DQEHATAcBgkqhkiG9w0BCQUxDxcNMDMwOTE2MTc1NDE4WjAjBgkqhkiG9w0BCQQxFgQU oA8HA0FreVNOqEPCSpChZBro8tQwUgYJKoZIhvcNAQkPMUUwQzAKBggqhkiG9w0DBzAOBggq hkiG9w0DAgICAIAwDQYIKoZIhvcNAwICAUAwBwYFKw4DAgcwDQYIKoZIhvcNAwICASgwgasG CSsGAQQBgjcQBDGBnTCBmjCBkjELMAkGA1UEBhMCWkExFTATBgNVBAgTDFdlc3Rlcm4gQ2Fw ZTESMBAGA1UEBxMJQ2FwZSBUb3duMQ8wDQYDVQQKEwZUaGF3dGUxHTAbBgNVBAsTFENlcnRp ZmljYXRlIFNlcnZpY2VzMSgwJgYDVQQDEx9QZXJzb25hbCBGcmVlbWFpbCBSU0EgMjAwMC44 LjMwAgMIbe8wga0GCyqGSIb3DQEJEAILMYGdoIGaMIGSMQswCQYDVQQGEwJaQTEVMBMGA1UE CBMMV2VzdGVybiBDYXBlMRIwEAYDVQQHEwlDYXBlIFRvd24xDzANBgNVBAoTBlRoYXd0ZTEd MBsGA1UECxMUQ2VydGlmaWNhdGUgU2VydmljZXMxKDAmBgNVBAMTH1BlcnNvbmFsIEZyZWVt YWlsIFJTQSAyMDAwLjguMzACAwht7zANBgkqhkiG9w0BAQEFAASCAQB8/WoHuC42t+bjg1yy uniIloAT6OPZ98j1EGCrM7Lss0Kg+jDzDomg09h2QAsznqFzUYLHm9jmnx0P0gXOl2n0YMt7 HkrLJ+nYNBKtn5nRPq/COW96/mIwUnEXYoGGUbQDzFyN+Vv02dGNh5JYFJsEvkk0MM/B1ga8 SKWrAZQOzN0258DWGDmPyr8L68WVdg1c9/FNfPQYwb1pf6PxYGe6Lbm5tZLlm44SsibP763G nnwz/IcG97UBj1ReDd9FOFUrj/U1TlBJmCC3GZJBUZIoYaz61iOSZG2n7ealqr1ZlzSlU/14 daAJPgbugnGDPXGN64dzBeT8fWvPr3NwFsxLAAAAAAAA --------------ms050501080803080306080302--