Return-Path: X-Original-To: apmail-lucene-java-user-archive@www.apache.org Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 62C35187D9 for ; Wed, 4 Nov 2015 21:53:45 +0000 (UTC) Received: (qmail 74305 invoked by uid 500); 4 Nov 2015 21:53:43 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 74251 invoked by uid 500); 4 Nov 2015 21:53:43 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 74240 invoked by uid 99); 4 Nov 2015 21:53:43 -0000 Received: from Unknown (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 04 Nov 2015 21:53:43 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id C4490C009E for ; Wed, 4 Nov 2015 21:53:42 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0 X-Spam-Level: X-Spam-Status: No, score=0 tagged_above=-999 required=6.31 tests=[RCVD_IN_MSPIKE_H2=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Received: from mx1-us-west.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id 0J9w6lYOkgwF for ; Wed, 4 Nov 2015 21:53:33 +0000 (UTC) Received: from mail-ig0-f171.google.com (mail-ig0-f171.google.com [209.85.213.171]) by mx1-us-west.apache.org (ASF Mail Server at mx1-us-west.apache.org) with ESMTPS id ACCEF212A3 for ; Wed, 4 Nov 2015 21:53:33 +0000 (UTC) Received: by igvi2 with SMTP id i2so99466908igv.0 for ; Wed, 04 Nov 2015 13:53:27 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:date:from:to:subject:in-reply-to:message-id :references:user-agent:mime-version:content-type:content-id; bh=b5bWMWtZkGrDmfi8o1voCa9ptWMk+c4qIuYBklNO2aA=; b=TDCc++NjziYHU4qFp2aFb6KWWgqVmFm2g98bLQ0fhy2iKaW7Eo+HSnXNmWQ1cAWL6W wyHgU5ZCDe8rzK4uErTdR9TI+sJ96O6+7O+GPtdecrWjfMRJFSMQ5JsQMb1QhGOlYepn 1rXZd0TfKRePrzS27FK6vJ0XgOk4Nx4aepSAWZ/dAe4zVtWDMYUM3PRK+Ryu0kr0R7+z EtmhklXKP3AZxVGfa5y1bfhBQXj5r/eEk8uZrBrqd7RXgjRpVig50T2/M/W4/Otw+2A5 AnA8gWjCe/RnbVZ60QZPbNZoR7fYvByZBnBd4cHEQ5eDpCdqzWnpQzuQL8z0fV34pZAh yS8Q== X-Gm-Message-State: ALoCoQn6jGBnBlLkXLzqnZeDLqIgPs8kwGh5xQy8HkIjNgxRLWUm1tbmF0Niw9aNMZdSsOLi6UrM X-Received: by 10.50.4.33 with SMTP id h1mr6099975igh.24.1446674007093; Wed, 04 Nov 2015 13:53:27 -0800 (PST) Received: from tray (c-73-24-177-166.hsd1.az.comcast.net. [73.24.177.166]) by smtp.gmail.com with ESMTPSA id w1sm5780044igz.13.2015.11.04.13.53.24 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 04 Nov 2015 13:53:25 -0800 (PST) Date: Wed, 4 Nov 2015 14:53:23 -0700 (MST) From: Chris Hostetter To: java-user@lucene.apache.org Subject: Re: sizes of non-fdt flies affected by compression settings In-Reply-To: Message-ID: References: User-Agent: Alpine 2.11 (DEB 23 2013-08-11) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; CHARSET=US-ASCII Content-ID: : This setting can only affect the size of the fdt (and fdx) files. I suspect : you saw differences in the size of other files because it caused Lucene to : run different merges (because segments had different sizes), and the : compression that we use for postings/terms worked better, but it could have : been the other way as well. You can check the number of documents in each segment to verify Adrien's comments. If you want to do a true "apples to apples" comparison on just the impacts of stored field compression, choose something like the NoMergePolicy or LogDocMergePolicy for your test to ensure that the number of documents per segment are not impacted by the size (in bytes) of any of the files in those segments. : > Hello, : > : > I'm experimenting with Lucene 5.2.1 and I see something I cannot find an : > easy explanation for in the api docs. : > Depending on whether I pick BEST_COMPRESSION or BEST_SPEED mode for : > StoredFieldsFormat almost all files become smaller for BEST_COMPRESSION : > mode. I expected only .fdt files to be smaller but for some reason the : > following file types also shrink very significantly: : > .fdx, .doc, .pos. Term dictionary (.tim) also gets smaller though not as : > significantly. Weirdly enough .tip becomes a little bigger for the best : > compressions setting. : > Index contained about 10M small (~300 bytes each) text docs. : > : > I guess I could go through the code myself to understand this but may be : > someone can shed some light on this. : > : > Thanks! : > : > Anton : > : -Hoss http://www.lucidworks.com/ --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org