couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jason Smith <...@iriscouch.com>
Subject Re: Size of couchdb documents
Date Thu, 15 Mar 2012 13:53:49 GMT
On Thu, Mar 15, 2012 at 8:38 AM, Daniel Gonzalez <gonvaled@gonvaled.com> wrote:
> I have the following document in a couchdb database:
>
> {
>   "_id": "000013a7-4df6-403b-952c-ed767b61554a",
>   "_rev": "1-54dc1794443105e9d16ba71531dd2850",
>   "tags": [
>       "auto_import"
>   ],
>   "ZZZZZZZZZZZ": "910111",
>   "UUUUUUUUUUUUU": "OOOOOOOOO",
>   "RECEIVING_OPERATOR": "073",
>   "type": "XXXXXXXXXXXXXXXXXXX",
>   "src_file": "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
> }
>
> This JSON file takes exactly 319 bytes if saved in my local
> filesystem. My documents are all like this (give or take a couple of
> bytes, since some of the fields have varying lengths).
>
> In my database I have currently around 6 millions documents, and they
> use 15 GB. That gives around 2.5KBytes/document. That means that the
> documents are taking 8 times more space on CouchDB as they would on
> disk.

Hi, Daniel. Excellent question!

Ask yourself, how much space does a 319 byte file *really* consume on a disk?

It must be more than 319 bytes because the operating system must store
file metadata too. And even the file data occupies a 4KB block.

On a Linux ext3 filesystem, there is the superblock (and its copies),
the block group descriptor table, block bitmaps, inode bitmaps,
inodes, and then of course data blocks--usually 4 kilobytes a pop.
Whoops! That exceeds the CouchDB average already. So what is the
storage cost of a 319-byte file?

CouchDB is the same. But running on top of the OS, it can't as easily
hide its metadata from the census.

Having said all of that, the CouchDB file format is indeed bloated,
particularly with numbers. The upcoming 1.2 release addresses that,
with several degrees of data compression supported.

I think most people are initially shocked by CouchDB's time and space
performance, however if you consider its amortized costs in real-world
usage, it is capable for many usage scenarios.

-- 
Iris Couch

Mime
View raw message