Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 88081 invoked from network); 26 May 2006 17:14:34 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 26 May 2006 17:14:34 -0000 Received: (qmail 58197 invoked by uid 500); 26 May 2006 17:14:23 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 58131 invoked by uid 500); 26 May 2006 17:14:23 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 58059 invoked by uid 99); 26 May 2006 17:14:22 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 26 May 2006 10:14:22 -0700 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received-SPF: pass (asf.osuosl.org: local policy) Received: from [209.97.205.201] (HELO mail.seseit.com) (209.97.205.201) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 26 May 2006 10:14:21 -0700 Received: by mail.seseit.com (Postfix, from userid 65534) id EB97A187AC; Fri, 26 May 2006 17:13:58 +0000 (GMT) Received: from max (unknown [213.166.4.230]) by mail.seseit.com (Postfix) with ESMTP id 84738187A9 for ; Fri, 26 May 2006 17:13:57 +0000 (GMT) From: "Rob Staveley (Tom)" To: Subject: RE: Seeing what's occupying all the space in the index Date: Fri, 26 May 2006 18:14:04 +0100 MIME-Version: 1.0 X-Mailer: Microsoft Office Outlook, Build 11.0.6353 Content-Type: multipart/signed; protocol="application/x-pkcs7-signature"; micalg=SHA1; boundary="----=_NextPart_000_010C_01C680F0.2B706390" In-Reply-To: <20060526145037.76356187A9@mail.seseit.com> Thread-Index: AcaAylXtd1Lzvl8ETxOVnDxum/LtXAAAiUWQAAE5VuAAAgGd8A== X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.2869 Message-Id: <20060526171357.84738187A9@mail.seseit.com> X-Spam-Checker-Version: SpamAssassin 3.0.3 (2005-04-27) on mini.seseit.net X-Spam-Level: X X-Virus-Checked: Checked by ClamAV on apache.org X-Old-Spam-Status: No, score=1.7 required=5.0 tests=MSGID_FROM_MTA_ID autolearn=no version=3.0.3 X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N ------=_NextPart_000_010C_01C680F0.2B706390 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit > Is there anything I can learn from the index directory's file listing? Running this nasty little BASH one-liner... $ for i in `ls * | perl -nle 'if (/^.+(\..+)/) {print $1;}' | sort | uniq`;do ls -l *$i | awk '{SUM = SUM + $5} END {if (SUM > 1e10) {print "'$i': ", SUM}}'; done ... I see.... .cfs: 1.23155e+10 .fdt: 5.06108e+10 .frq: 1.27472e+10 .prx: 1.3444e+10 That means I have 98 GB of files, with: 51 GB devoted to field data (.fdt), 13 BG devoted to term positions (.prx) 13 BG devoted to term frequencies (.frq) 12 BG devoted to compound files for the field index (.cfs) Does that seem reasonable, bearing in mind I have only indexed 4.3 million Lucene documents? That's 22.8 kB per Lucene document, and apart from a 300 character synopsis the fields are all much less than 100 characters long, and yet this suggests that the index is providing 600 bytes per field. ------=_NextPart_000_010C_01C680F0.2B706390 Content-Type: application/x-pkcs7-signature; name="smime.p7s" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="smime.p7s" MIAGCSqGSIb3DQEHAqCAMIACAQExCzAJBgUrDgMCGgUAMIAGCSqGSIb3DQEHAQAAoIIKaDCCAj0w ggGmAhEAzbp/VvDf5LxU/iKss3KqVTANBgkqhkiG9w0BAQIFADBfMQswCQYDVQQGEwJVUzEXMBUG A1UEChMOVmVyaVNpZ24sIEluYy4xNzA1BgNVBAsTLkNsYXNzIDEgUHVibGljIFByaW1hcnkgQ2Vy dGlmaWNhdGlvbiBBdXRob3JpdHkwHhcNOTYwMTI5MDAwMDAwWhcNMjgwODAxMjM1OTU5WjBfMQsw CQYDVQQGEwJVUzEXMBUGA1UEChMOVmVyaVNpZ24sIEluYy4xNzA1BgNVBAsTLkNsYXNzIDEgUHVi bGljIFByaW1hcnkgQ2VydGlmaWNhdGlvbiBBdXRob3JpdHkwgZ8wDQYJKoZIhvcNAQEBBQADgY0A MIGJAoGBAOUZv22jVmEtmUhx9mfeuY3rt56GgAqRDvo4Ja9GiILlc6igmyRdDR/MZW4MsNBWhBiH mgabEKFz37RYOWtuwfYV1aioP6oSBo0xrH+wNNePNGeICc0UEeJORVZpH3gCgNrcR5EpuzbJY1zF 4Ncth3uhtzKwezC6Ki8xqu6jZ9rbAgMBAAEwDQYJKoZIhvcNAQECBQADgYEATD+4i8Zo3+5DMw5d 6abLB4RNejP/khv0Nq3YlSI2aBFsfELM85wuxAc/FLAPT/+Qknb54rxK6Y/NoIAK98Up8YIiXbix 3YEjo3slFUYweRb46gVLlH8dwhzI47f0EEA8E8NfH1PoSOSGtHuhNbB7Jbq4046rPzidADQAmPPR cZQwggNiMIICy6ADAgECAhAL2gsXwT+JjqsJdHq0zi4zMA0GCSqGSIb3DQEBAgUAMF8xCzAJBgNV BAYTAlVTMRcwFQYDVQQKEw5WZXJpU2lnbiwgSW5jLjE3MDUGA1UECxMuQ2xhc3MgMSBQdWJsaWMg UHJpbWFyeSBDZXJ0aWZpY2F0aW9uIEF1dGhvcml0eTAeFw05ODA1MTIwMDAwMDBaFw0wODA1MTIy MzU5NTlaMIHMMRcwFQYDVQQKEw5WZXJpU2lnbiwgSW5jLjEfMB0GA1UECxMWVmVyaVNpZ24gVHJ1 c3QgTmV0d29yazFGMEQGA1UECxM9d3d3LnZlcmlzaWduLmNvbS9yZXBvc2l0b3J5L1JQQSBJbmNv cnAuIEJ5IFJlZi4sTElBQi5MVEQoYyk5ODFIMEYGA1UEAxM/VmVyaVNpZ24gQ2xhc3MgMSBDQSBJ bmRpdmlkdWFsIFN1YnNjcmliZXItUGVyc29uYSBOb3QgVmFsaWRhdGVkMIGfMA0GCSqGSIb3DQEB AQUAA4GNADCBiQKBgQC7WkSKBBa7Vf0DeootlE8VeDa4DUqyb5xUv7zodyqdufBou5XZMUFweoFL uUgTVi3HCOGEQqvAopKrRFyqQvCCDgLpL/vCO7u+yScKXbawNkIztW5UiE+HSr8Z2vkV6A+Hthzj zMaajn9qJJLj/OBluqexfu/J2zdqyErICQbkmQIDAQABo4GwMIGtMA8GA1UdEwQIMAYBAf8CAQAw RwYDVR0gBEAwPjA8BgtghkgBhvhFAQcBATAtMCsGCCsGAQUFBwIBFh93d3cudmVyaXNpZ24uY29t L3JlcG9zaXRvcnkvUlBBMDEGA1UdHwQqMCgwJqAkoCKGIGh0dHA6Ly9jcmwudmVyaXNpZ24uY29t L3BjYTEuY3JsMAsGA1UdDwQEAwIBBjARBglghkgBhvhCAQEEBAMCAQYwDQYJKoZIhvcNAQECBQAD gYEAAn2eb0VLOKC43ulTZCG85Ewrjx7+kkCs2Ao5aqEyISwHm6tZ/tJiGn1VOLA3c9z0B2ZjYr3h U3BSh+eo2FLpWy2q4d7PrDFU1IsZyNgjqO8EKzJ9LBgcyHyJqC538kTRZQpNdLXu0xuSc3QuiTs1 E3LnQDGa07LEq+dWvovj+xUwggS9MIIEJqADAgECAhB1uB0L6Y5EVCCaaM0jkP2tMA0GCSqGSIb3 DQEBBQUAMIHMMRcwFQYDVQQKEw5WZXJpU2lnbiwgSW5jLjEfMB0GA1UECxMWVmVyaVNpZ24gVHJ1 c3QgTmV0d29yazFGMEQGA1UECxM9d3d3LnZlcmlzaWduLmNvbS9yZXBvc2l0b3J5L1JQQSBJbmNv cnAuIEJ5IFJlZi4sTElBQi5MVEQoYyk5ODFIMEYGA1UEAxM/VmVyaVNpZ24gQ2xhc3MgMSBDQSBJ bmRpdmlkdWFsIFN1YnNjcmliZXItUGVyc29uYSBOb3QgVmFsaWRhdGVkMB4XDTA1MTAwNDAwMDAw MFoXDTA2MTAxODIzNTk1OVowggEZMRcwFQYDVQQKEw5WZXJpU2lnbiwgSW5jLjEfMB0GA1UECxMW VmVyaVNpZ24gVHJ1c3QgTmV0d29yazFGMEQGA1UECxM9d3d3LnZlcmlzaWduLmNvbS9yZXBvc2l0 b3J5L1JQQSBJbmNvcnAuIGJ5IFJlZi4sTElBQi5MVEQoYyk5ODEeMBwGA1UECxMVUGVyc29uYSBO b3QgVmFsaWRhdGVkMTMwMQYDVQQLEypEaWdpdGFsIElEIENsYXNzIDEgLSBOZXRzY2FwZSBGdWxs IFNlcnZpY2UxGzAZBgNVBAMUElJvYiBTdGF2ZWxleSAoVG9tKTEjMCEGCSqGSIb3DQEJARYUcnN0 YXZlbGV5QHNlc2VpdC5jb20wggEiMA0GCSqGSIb3DQEBAQUAA4IBDwAwggEKAoIBAQCm4C+s/DP/ NIW0DrHFueJPsZAZxTzi8Aw2bXOUQbHRDNlSO7nWoDPlkTs5RQl0tsJUDlB5TubjJGWKGBdRePXq ueK/8yj23cBTzKRTU2gNG6cmmc/f/HwHcuc4MCpW9+okvFCPeaFNzuBPPvegjewz3BL3ewfjeOKr ly//fr/8JBVW19EgK070C2COZgegYNQ5NeV4Y4oDjqETADQDDnYuTe7EMBuSMysmOSgkCfVzEG4h 8N+1ImW6hWe+bH6Rify+q9eZAQint7sSFGZv3ZbBgmE654FKrvE7O8BOOP807/yaJUBlpDkF/X8G UwvYNwhH/zbHN1BSCD3tstcSIUGHAgMBAAGjgcswgcgwCQYDVR0TBAIwADBEBgNVHSAEPTA7MDkG C2CGSAGG+EUBBxcDMCowKAYIKwYBBQUHAgEWHGh0dHBzOi8vd3d3LnZlcmlzaWduLmNvbS9ycGEw CwYDVR0PBAQDAgWgMB0GA1UdJQQWMBQGCCsGAQUFBwMEBggrBgEFBQcDAjAUBgpghkgBhvhFAQYH BAYWBE5vbmUwMwYDVR0fBCwwKjAooCagJIYiaHR0cDovL2NybC52ZXJpc2lnbi5jb20vY2xhc3Mx LmNybDANBgkqhkiG9w0BAQUFAAOBgQCtsGmCX3v78Qn3I5DAo9mJDenolmgmBjmlaSJr1e5nSBLv CEflpx3FaZuirDQNJ/sTxtUCJhlv/kTLuczeJkSDj1ALhqeYH1GX9T5zFY+3GCQqTJcTIarHSqZC PQzwv3dwZCRE0fuELb6oobi4kGA5PXbu3+P7upHSBAmN81O00TGCBL8wggS7AgEBMIHhMIHMMRcw FQYDVQQKEw5WZXJpU2lnbiwgSW5jLjEfMB0GA1UECxMWVmVyaVNpZ24gVHJ1c3QgTmV0d29yazFG MEQGA1UECxM9d3d3LnZlcmlzaWduLmNvbS9yZXBvc2l0b3J5L1JQQSBJbmNvcnAuIEJ5IFJlZi4s TElBQi5MVEQoYyk5ODFIMEYGA1UEAxM/VmVyaVNpZ24gQ2xhc3MgMSBDQSBJbmRpdmlkdWFsIFN1 YnNjcmliZXItUGVyc29uYSBOb3QgVmFsaWRhdGVkAhB1uB0L6Y5EVCCaaM0jkP2tMAkGBSsOAwIa BQCgggKyMBgGCSqGSIb3DQEJAzELBgkqhkiG9w0BBwEwHAYJKoZIhvcNAQkFMQ8XDTA2MDUyNjE3 MTQwM1owIwYJKoZIhvcNAQkEMRYEFKv7id6FsXJSJ3doiowjE+4VEA+OMGcGCSqGSIb3DQEJDzFa MFgwCgYIKoZIhvcNAwcwDgYIKoZIhvcNAwICAgCAMA0GCCqGSIb3DQMCAgFAMAcGBSsOAwIHMA0G CCqGSIb3DQMCAgEoMAcGBSsOAwIaMAoGCCqGSIb3DQIFMIHyBgkrBgEEAYI3EAQxgeQwgeEwgcwx FzAVBgNVBAoTDlZlcmlTaWduLCBJbmMuMR8wHQYDVQQLExZWZXJpU2lnbiBUcnVzdCBOZXR3b3Jr MUYwRAYDVQQLEz13d3cudmVyaXNpZ24uY29tL3JlcG9zaXRvcnkvUlBBIEluY29ycC4gQnkgUmVm LixMSUFCLkxURChjKTk4MUgwRgYDVQQDEz9WZXJpU2lnbiBDbGFzcyAxIENBIEluZGl2aWR1YWwg U3Vic2NyaWJlci1QZXJzb25hIE5vdCBWYWxpZGF0ZWQCEHW4HQvpjkRUIJpozSOQ/a0wgfQGCyqG SIb3DQEJEAILMYHkoIHhMIHMMRcwFQYDVQQKEw5WZXJpU2lnbiwgSW5jLjEfMB0GA1UECxMWVmVy aVNpZ24gVHJ1c3QgTmV0d29yazFGMEQGA1UECxM9d3d3LnZlcmlzaWduLmNvbS9yZXBvc2l0b3J5 L1JQQSBJbmNvcnAuIEJ5IFJlZi4sTElBQi5MVEQoYyk5ODFIMEYGA1UEAxM/VmVyaVNpZ24gQ2xh c3MgMSBDQSBJbmRpdmlkdWFsIFN1YnNjcmliZXItUGVyc29uYSBOb3QgVmFsaWRhdGVkAhB1uB0L 6Y5EVCCaaM0jkP2tMA0GCSqGSIb3DQEBAQUABIIBAGtn3qRXFAbZg7yLRwqLEsWnx7yygoOBPNl2 33xRBAHdrPaG2DOPTV92c/1pqmUdxL+m44ze/FwViG9SDswGmEh927VwNR9OLk5i9A0tQH5GlEc7 0bGbegfgHhHnvBOKsQ0mxMKDJePxDwBhOP7WgO1wD9kZ+Cs1mXiH69M+CnkgynclEMs4XXkcJvEg F274uxh2Ir5jUmuXzV8A9NjKc8GeER0pt9B+Tbk94Qg2vrS4zmBEjkxZlGHstYNHQN+BbkuJTybr fpzEHeX3aypFeuffZj2yxuLTT9uUzj+/O+OHt8v1s18f8LNPyc5CLP3Guqiu1z1hIKayJwQpHWXl rTgAAAAAAAA= ------=_NextPart_000_010C_01C680F0.2B706390--