Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 63691 invoked from network); 18 Jan 2011 12:13:38 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 18 Jan 2011 12:13:38 -0000 Received: (qmail 11395 invoked by uid 500); 18 Jan 2011 12:13:35 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 11248 invoked by uid 500); 18 Jan 2011 12:13:33 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 11238 invoked by uid 99); 18 Jan 2011 12:13:32 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 18 Jan 2011 12:13:32 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=10.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of danbsutton@gmail.com designates 209.85.216.48 as permitted sender) Received: from [209.85.216.48] (HELO mail-qw0-f48.google.com) (209.85.216.48) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 18 Jan 2011 12:13:26 +0000 Received: by qwh6 with SMTP id 6so5998008qwh.35 for ; Tue, 18 Jan 2011 04:13:05 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:date:message-id:subject:from:to :content-type; bh=yTvhC680eN39i1fC8DFtwgHFDrN1IrPkH/rSENi6ftA=; b=lwJ8tVwqDHRjromGkELTLJyssCQ1lfTxgC92DNlnbnNCASPwU5oVoOuNlS2TTyeghI rx7LkCrAygEym/X9QTFq2N+j5+90/t8ONhT0GaPIIDnNZ+mQ8laaGRJj2TVvORiEWCTr GXTMIAcQTl4E7xAx2saWK4D0+4hNOQcHl1QpM= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:date:message-id:subject:from:to:content-type; b=G0fMnB+q9thi7R2XvVs/GJesw8XW5j0EzinxceCAFVnUHYSk77OpnB7QMEptkiRWxN yMTgPG8nDQTNp5xzELVruOSrJ9xByJwoi36Ttai+6F/2IVMGyXvtbj3sA4KBqAo7Obrd dSWzVVA4nJdnunY153r0qyCOKuiZTkF2fsFGM= MIME-Version: 1.0 Received: by 10.229.91.72 with SMTP id l8mr4060986qcm.137.1295352785651; Tue, 18 Jan 2011 04:13:05 -0800 (PST) Received: by 10.229.11.84 with HTTP; Tue, 18 Jan 2011 04:13:05 -0800 (PST) Date: Tue, 18 Jan 2011 12:13:05 +0000 Message-ID: Subject: Large .frq file From: dan sutton To: java-user Content-Type: text/plain; charset=ISO-8859-1 Hi, We're trying to create a large index via solr for trends and notice that we have a large '.frq' file after doing the following: make all text fields index="true", stored="false", omitTermFreqAndPositions="true" omitNorms="true" termPositions="false" termOffsets="false" termVectors="false" We are using a variation on org.apache.lucene.analysis.cjk and notice that the .frq is about 4 time larger than, for example, the WhiteSpaceTokenizer. Considering that with omitTermFreqAndPositions="true" for the text fields I'd have thought this should be : "If omitTf were true it would be this sequence of VInts instead:" (http://lucene.apache.org/java/2_9_1/fileformats.html#Frequencies) Can anyone suggest how I can reduce the size of this file? Many thanks, Dan Lucene Specification Version: 2.9.1 Solr Specification Version: 1.4.0.2010.09.10.17.10.36 --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org