Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 4298 invoked from network); 26 May 2006 18:12:31 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 26 May 2006 18:12:31 -0000 Received: (qmail 43984 invoked by uid 500); 26 May 2006 18:12:24 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 43911 invoked by uid 500); 26 May 2006 18:12:24 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 43895 invoked by uid 99); 26 May 2006 18:12:24 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 26 May 2006 11:12:24 -0700 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received-SPF: pass (asf.osuosl.org: local policy) Received: from [209.97.205.201] (HELO mail.seseit.com) (209.97.205.201) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 26 May 2006 11:12:23 -0700 Received: by mail.seseit.com (Postfix, from userid 65534) id 4D8F1187AC; Fri, 26 May 2006 18:12:00 +0000 (GMT) Received: from max (unknown [213.166.4.230]) by mail.seseit.com (Postfix) with ESMTP id 6E449187A9 for ; Fri, 26 May 2006 18:11:54 +0000 (GMT) From: "Rob Staveley (Tom)" To: Subject: RE: Seeing what's occupying all the space in the index Date: Fri, 26 May 2006 19:11:55 +0100 MIME-Version: 1.0 X-Mailer: Microsoft Office Outlook, Build 11.0.6353 Content-Type: multipart/signed; protocol="application/x-pkcs7-signature"; micalg=SHA1; boundary="----=_NextPart_000_0126_01C680F8.40DDA4B0" In-Reply-To: <44773CE5.3010209@syr.edu> Thread-Index: AcaA6xjlmyZifsEhRIqyp2xjUwKmewAAwUCw X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.2869 Message-Id: <20060526181154.6E449187A9@mail.seseit.com> X-Spam-Checker-Version: SpamAssassin 3.0.3 (2005-04-27) on mini.seseit.net X-Spam-Level: X X-Virus-Checked: Checked by ClamAV on apache.org X-Old-Spam-Status: No, score=1.7 required=5.0 tests=MSGID_FROM_MTA_ID autolearn=no version=3.0.3 X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N ------=_NextPart_000_0126_01C680F8.40DDA4B0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Interesting. I am explicitly turning on the compound file format when I start my application, but I am suspicious about my optimizing thread. It *ought* to be optimising every 30 minutes, using thread synchronisation to prevent the writer from trying to write while optimisation takes place, but it is possible that I'm screwing up there (I'll add some diagnostics to check that optimisation and index writing are mutually exclusive). When I stopped my daemon and manually optimised, it took 11 minutes to optimise the index. Is your understanding that .fdt, .frq and .prx files are working files pre-optimisation and then when optimize() is called they should all get absorbed into the .cfs? Manual optimisation only clawed back 1G, but I didn't look to see if .fdt, .frq and .prx files were absorbed into the .cfs files in the process. I'll investigate that now. > Can you try a smaller sample in a clean directory and see what size it is (so that it doesn't take as long to index)? I'll try tee-ing off a message feed and index in a new index. I'm working with a live message feed. -----Original Message----- From: Grant Ingersoll [mailto:gsingers@syr.edu] Sent: 26 May 2006 18:38 To: java-user@lucene.apache.org Subject: Re: Seeing what's occupying all the space in the index It seems odd to me that if you are using the CFS format, why you would have the .fdt, .frq and .prx files in addition to the .cfs files. My understanding is all files (except deletable and segment) get put inside of the CFS file. Looking at my indices, I only have the CFS file. Are you optimizing your indices after you are done indexing? Are you turning off compound file format? Can you try a smaller sample in a clean directory and see what size it is (so that it doesn't take as long to index)? ------=_NextPart_000_0126_01C680F8.40DDA4B0 Content-Type: application/x-pkcs7-signature; name="smime.p7s" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="smime.p7s" MIAGCSqGSIb3DQEHAqCAMIACAQExCzAJBgUrDgMCGgUAMIAGCSqGSIb3DQEHAQAAoIIKaDCCAj0w ggGmAhEAzbp/VvDf5LxU/iKss3KqVTANBgkqhkiG9w0BAQIFADBfMQswCQYDVQQGEwJVUzEXMBUG A1UEChMOVmVyaVNpZ24sIEluYy4xNzA1BgNVBAsTLkNsYXNzIDEgUHVibGljIFByaW1hcnkgQ2Vy dGlmaWNhdGlvbiBBdXRob3JpdHkwHhcNOTYwMTI5MDAwMDAwWhcNMjgwODAxMjM1OTU5WjBfMQsw CQYDVQQGEwJVUzEXMBUGA1UEChMOVmVyaVNpZ24sIEluYy4xNzA1BgNVBAsTLkNsYXNzIDEgUHVi bGljIFByaW1hcnkgQ2VydGlmaWNhdGlvbiBBdXRob3JpdHkwgZ8wDQYJKoZIhvcNAQEBBQADgY0A MIGJAoGBAOUZv22jVmEtmUhx9mfeuY3rt56GgAqRDvo4Ja9GiILlc6igmyRdDR/MZW4MsNBWhBiH mgabEKFz37RYOWtuwfYV1aioP6oSBo0xrH+wNNePNGeICc0UEeJORVZpH3gCgNrcR5EpuzbJY1zF 4Ncth3uhtzKwezC6Ki8xqu6jZ9rbAgMBAAEwDQYJKoZIhvcNAQECBQADgYEATD+4i8Zo3+5DMw5d 6abLB4RNejP/khv0Nq3YlSI2aBFsfELM85wuxAc/FLAPT/+Qknb54rxK6Y/NoIAK98Up8YIiXbix 3YEjo3slFUYweRb46gVLlH8dwhzI47f0EEA8E8NfH1PoSOSGtHuhNbB7Jbq4046rPzidADQAmPPR cZQwggNiMIICy6ADAgECAhAL2gsXwT+JjqsJdHq0zi4zMA0GCSqGSIb3DQEBAgUAMF8xCzAJBgNV BAYTAlVTMRcwFQYDVQQKEw5WZXJpU2lnbiwgSW5jLjE3MDUGA1UECxMuQ2xhc3MgMSBQdWJsaWMg UHJpbWFyeSBDZXJ0aWZpY2F0aW9uIEF1dGhvcml0eTAeFw05ODA1MTIwMDAwMDBaFw0wODA1MTIy MzU5NTlaMIHMMRcwFQYDVQQKEw5WZXJpU2lnbiwgSW5jLjEfMB0GA1UECxMWVmVyaVNpZ24gVHJ1 c3QgTmV0d29yazFGMEQGA1UECxM9d3d3LnZlcmlzaWduLmNvbS9yZXBvc2l0b3J5L1JQQSBJbmNv cnAuIEJ5IFJlZi4sTElBQi5MVEQoYyk5ODFIMEYGA1UEAxM/VmVyaVNpZ24gQ2xhc3MgMSBDQSBJ bmRpdmlkdWFsIFN1YnNjcmliZXItUGVyc29uYSBOb3QgVmFsaWRhdGVkMIGfMA0GCSqGSIb3DQEB AQUAA4GNADCBiQKBgQC7WkSKBBa7Vf0DeootlE8VeDa4DUqyb5xUv7zodyqdufBou5XZMUFweoFL uUgTVi3HCOGEQqvAopKrRFyqQvCCDgLpL/vCO7u+yScKXbawNkIztW5UiE+HSr8Z2vkV6A+Hthzj zMaajn9qJJLj/OBluqexfu/J2zdqyErICQbkmQIDAQABo4GwMIGtMA8GA1UdEwQIMAYBAf8CAQAw RwYDVR0gBEAwPjA8BgtghkgBhvhFAQcBATAtMCsGCCsGAQUFBwIBFh93d3cudmVyaXNpZ24uY29t L3JlcG9zaXRvcnkvUlBBMDEGA1UdHwQqMCgwJqAkoCKGIGh0dHA6Ly9jcmwudmVyaXNpZ24uY29t L3BjYTEuY3JsMAsGA1UdDwQEAwIBBjARBglghkgBhvhCAQEEBAMCAQYwDQYJKoZIhvcNAQECBQAD gYEAAn2eb0VLOKC43ulTZCG85Ewrjx7+kkCs2Ao5aqEyISwHm6tZ/tJiGn1VOLA3c9z0B2ZjYr3h U3BSh+eo2FLpWy2q4d7PrDFU1IsZyNgjqO8EKzJ9LBgcyHyJqC538kTRZQpNdLXu0xuSc3QuiTs1 E3LnQDGa07LEq+dWvovj+xUwggS9MIIEJqADAgECAhB1uB0L6Y5EVCCaaM0jkP2tMA0GCSqGSIb3 DQEBBQUAMIHMMRcwFQYDVQQKEw5WZXJpU2lnbiwgSW5jLjEfMB0GA1UECxMWVmVyaVNpZ24gVHJ1 c3QgTmV0d29yazFGMEQGA1UECxM9d3d3LnZlcmlzaWduLmNvbS9yZXBvc2l0b3J5L1JQQSBJbmNv cnAuIEJ5IFJlZi4sTElBQi5MVEQoYyk5ODFIMEYGA1UEAxM/VmVyaVNpZ24gQ2xhc3MgMSBDQSBJ bmRpdmlkdWFsIFN1YnNjcmliZXItUGVyc29uYSBOb3QgVmFsaWRhdGVkMB4XDTA1MTAwNDAwMDAw MFoXDTA2MTAxODIzNTk1OVowggEZMRcwFQYDVQQKEw5WZXJpU2lnbiwgSW5jLjEfMB0GA1UECxMW VmVyaVNpZ24gVHJ1c3QgTmV0d29yazFGMEQGA1UECxM9d3d3LnZlcmlzaWduLmNvbS9yZXBvc2l0 b3J5L1JQQSBJbmNvcnAuIGJ5IFJlZi4sTElBQi5MVEQoYyk5ODEeMBwGA1UECxMVUGVyc29uYSBO b3QgVmFsaWRhdGVkMTMwMQYDVQQLEypEaWdpdGFsIElEIENsYXNzIDEgLSBOZXRzY2FwZSBGdWxs IFNlcnZpY2UxGzAZBgNVBAMUElJvYiBTdGF2ZWxleSAoVG9tKTEjMCEGCSqGSIb3DQEJARYUcnN0 YXZlbGV5QHNlc2VpdC5jb20wggEiMA0GCSqGSIb3DQEBAQUAA4IBDwAwggEKAoIBAQCm4C+s/DP/ NIW0DrHFueJPsZAZxTzi8Aw2bXOUQbHRDNlSO7nWoDPlkTs5RQl0tsJUDlB5TubjJGWKGBdRePXq ueK/8yj23cBTzKRTU2gNG6cmmc/f/HwHcuc4MCpW9+okvFCPeaFNzuBPPvegjewz3BL3ewfjeOKr ly//fr/8JBVW19EgK070C2COZgegYNQ5NeV4Y4oDjqETADQDDnYuTe7EMBuSMysmOSgkCfVzEG4h 8N+1ImW6hWe+bH6Rify+q9eZAQint7sSFGZv3ZbBgmE654FKrvE7O8BOOP807/yaJUBlpDkF/X8G UwvYNwhH/zbHN1BSCD3tstcSIUGHAgMBAAGjgcswgcgwCQYDVR0TBAIwADBEBgNVHSAEPTA7MDkG C2CGSAGG+EUBBxcDMCowKAYIKwYBBQUHAgEWHGh0dHBzOi8vd3d3LnZlcmlzaWduLmNvbS9ycGEw CwYDVR0PBAQDAgWgMB0GA1UdJQQWMBQGCCsGAQUFBwMEBggrBgEFBQcDAjAUBgpghkgBhvhFAQYH BAYWBE5vbmUwMwYDVR0fBCwwKjAooCagJIYiaHR0cDovL2NybC52ZXJpc2lnbi5jb20vY2xhc3Mx LmNybDANBgkqhkiG9w0BAQUFAAOBgQCtsGmCX3v78Qn3I5DAo9mJDenolmgmBjmlaSJr1e5nSBLv CEflpx3FaZuirDQNJ/sTxtUCJhlv/kTLuczeJkSDj1ALhqeYH1GX9T5zFY+3GCQqTJcTIarHSqZC PQzwv3dwZCRE0fuELb6oobi4kGA5PXbu3+P7upHSBAmN81O00TGCBL8wggS7AgEBMIHhMIHMMRcw FQYDVQQKEw5WZXJpU2lnbiwgSW5jLjEfMB0GA1UECxMWVmVyaVNpZ24gVHJ1c3QgTmV0d29yazFG MEQGA1UECxM9d3d3LnZlcmlzaWduLmNvbS9yZXBvc2l0b3J5L1JQQSBJbmNvcnAuIEJ5IFJlZi4s TElBQi5MVEQoYyk5ODFIMEYGA1UEAxM/VmVyaVNpZ24gQ2xhc3MgMSBDQSBJbmRpdmlkdWFsIFN1 YnNjcmliZXItUGVyc29uYSBOb3QgVmFsaWRhdGVkAhB1uB0L6Y5EVCCaaM0jkP2tMAkGBSsOAwIa BQCgggKyMBgGCSqGSIb3DQEJAzELBgkqhkiG9w0BBwEwHAYJKoZIhvcNAQkFMQ8XDTA2MDUyNjE4 MTE1NVowIwYJKoZIhvcNAQkEMRYEFCcCA88QTfOT4Kx3JSZlrdRHagfGMGcGCSqGSIb3DQEJDzFa MFgwCgYIKoZIhvcNAwcwDgYIKoZIhvcNAwICAgCAMA0GCCqGSIb3DQMCAgFAMAcGBSsOAwIHMA0G CCqGSIb3DQMCAgEoMAcGBSsOAwIaMAoGCCqGSIb3DQIFMIHyBgkrBgEEAYI3EAQxgeQwgeEwgcwx FzAVBgNVBAoTDlZlcmlTaWduLCBJbmMuMR8wHQYDVQQLExZWZXJpU2lnbiBUcnVzdCBOZXR3b3Jr MUYwRAYDVQQLEz13d3cudmVyaXNpZ24uY29tL3JlcG9zaXRvcnkvUlBBIEluY29ycC4gQnkgUmVm LixMSUFCLkxURChjKTk4MUgwRgYDVQQDEz9WZXJpU2lnbiBDbGFzcyAxIENBIEluZGl2aWR1YWwg U3Vic2NyaWJlci1QZXJzb25hIE5vdCBWYWxpZGF0ZWQCEHW4HQvpjkRUIJpozSOQ/a0wgfQGCyqG SIb3DQEJEAILMYHkoIHhMIHMMRcwFQYDVQQKEw5WZXJpU2lnbiwgSW5jLjEfMB0GA1UECxMWVmVy aVNpZ24gVHJ1c3QgTmV0d29yazFGMEQGA1UECxM9d3d3LnZlcmlzaWduLmNvbS9yZXBvc2l0b3J5 L1JQQSBJbmNvcnAuIEJ5IFJlZi4sTElBQi5MVEQoYyk5ODFIMEYGA1UEAxM/VmVyaVNpZ24gQ2xh c3MgMSBDQSBJbmRpdmlkdWFsIFN1YnNjcmliZXItUGVyc29uYSBOb3QgVmFsaWRhdGVkAhB1uB0L 6Y5EVCCaaM0jkP2tMA0GCSqGSIb3DQEBAQUABIIBAGiNHYvbq9+/9IU5DqeAz8mrTzw1fhOV2+E8 GrdvHzUA2cdFoR/zETQOdSX0ZkpKCXwCC7ngK4R1uZOmEhKNT0U6XerUeYrAD7JD9pa1+Y/fuwkY kIDVo+gK1dtUygQm3LLe0XzEt4+9thFbzJD4A3xdxPoh8yAjjAJnHPxTDCsY1+LtbobeRTonIUwQ Ua1MJ8BwXfBsV+Bs331jjJR4EtsKkktiony6xAUK32FDcJIQ8rDsuEu5ZDj9Y8Wt0wFBwDD62I8L IW5bcSI3/1dLrI92IwvqF3cM0GIF/7ofiT1OwvwBXoKyJF9umLht7K091yac/z9ym+TcJUQrez7Q BR8AAAAAAAA= ------=_NextPart_000_0126_01C680F8.40DDA4B0--