couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexey Loshkarev <elf2...@gmail.com>
Subject Re: couchdb disk storage format - why so large overhead?
Date Wed, 28 Dec 2011 18:16:08 GMT
>
> Also, I just realized here
> (http://www.erlang.org/doc/apps/erts/erl_ext_dist.html), cite:
> ===============
> A float is stored in string format. the format used in sprintf to
> format the float is "%.20e" (there are more bytes allocated than
> necessary)
> ===============
> So, every float requires 33 bytes off disk space. Not so efficient.

Reading specs I realized that using minor_version = 1 in
term_to_binary options makes floats be 9-bytes long instead 33.
It just search/replace in few files.

I tested my data blob with this map function:

fun({Doc}) ->
   Emit(<<"raw">>, size(term_to_binary(Doc))),
   Emit(<<"raw_1">>, size(term_to_binary(Doc, [{minor_version, 1}])))
end.

And received ~10% space (~600MB instead of ~700MB) usage decrease.

Do I need to file a bug in Jira for it?



-- 
----------------
Best regards
Alexey Loshkarev
mailto:elf2001@gmail.com

Mime
View raw message