Return-Path: X-Original-To: apmail-couchdb-user-archive@www.apache.org Delivered-To: apmail-couchdb-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A21899105 for ; Wed, 28 Dec 2011 18:16:36 +0000 (UTC) Received: (qmail 47202 invoked by uid 500); 28 Dec 2011 18:16:34 -0000 Delivered-To: apmail-couchdb-user-archive@couchdb.apache.org Received: (qmail 47164 invoked by uid 500); 28 Dec 2011 18:16:34 -0000 Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@couchdb.apache.org Delivered-To: mailing list user@couchdb.apache.org Received: (qmail 47156 invoked by uid 99); 28 Dec 2011 18:16:34 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 28 Dec 2011 18:16:34 +0000 X-ASF-Spam-Status: No, hits=-0.6 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of elf2001@gmail.com designates 209.85.210.180 as permitted sender) Received: from [209.85.210.180] (HELO mail-iy0-f180.google.com) (209.85.210.180) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 28 Dec 2011 18:16:29 +0000 Received: by iazz13 with SMTP id z13so25369553iaz.11 for ; Wed, 28 Dec 2011 10:16:09 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=/JPi/3nsvqTjZWdKRWU9eSb08bOrgKvhXL0JFzKi5LE=; b=bdY2dhYfrG1t2nAAkRnhR1rghk732aHCrl5gTD9CFCLDhA2ejOUSUUo7PN0z5kA4gr NPBTH6jHgkc/+bdUhfNHdgAYbR5GAnZy8KbtTtcdQJB2SOcvT7uOXV1ZY11u5uJYzIYV hBsDjLzyC/ctgnK5y85GgZLfif6dtDGPZBXeA= MIME-Version: 1.0 Received: by 10.50.153.135 with SMTP id vg7mr37480310igb.12.1325096168569; Wed, 28 Dec 2011 10:16:08 -0800 (PST) Received: by 10.42.1.200 with HTTP; Wed, 28 Dec 2011 10:16:08 -0800 (PST) In-Reply-To: References: Date: Wed, 28 Dec 2011 20:16:08 +0200 Message-ID: Subject: Re: couchdb disk storage format - why so large overhead? From: Alexey Loshkarev To: user@couchdb.apache.org Content-Type: text/plain; charset=ISO-8859-1 > > Also, I just realized here > (http://www.erlang.org/doc/apps/erts/erl_ext_dist.html), cite: > =============== > A float is stored in string format. the format used in sprintf to > format the float is "%.20e" (there are more bytes allocated than > necessary) > =============== > So, every float requires 33 bytes off disk space. Not so efficient. Reading specs I realized that using minor_version = 1 in term_to_binary options makes floats be 9-bytes long instead 33. It just search/replace in few files. I tested my data blob with this map function: fun({Doc}) -> Emit(<<"raw">>, size(term_to_binary(Doc))), Emit(<<"raw_1">>, size(term_to_binary(Doc, [{minor_version, 1}]))) end. And received ~10% space (~600MB instead of ~700MB) usage decrease. Do I need to file a bug in Jira for it? -- ---------------- Best regards Alexey Loshkarev mailto:elf2001@gmail.com