couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Davis <paul.joseph.da...@gmail.com>
Subject Re: View file size
Date Tue, 12 Jan 2010 15:58:01 GMT
On Tue, Jan 12, 2010 at 2:35 AM, Roger Binns <rogerb@rogerbinns.com> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Roger Binns wrote:
>> As a guess should I perhaps also order my documents so that the value
>> emitted in a view (in most cases one value always from a key named "name")
>> is also in sorted order?  With some SQL magic during document generation
>> should be able to do that.
>
> So I did that.  The resulting view file is 15GB (was 27GB) and took 43
> minutes to generate (was 75 minutes).  Disk utilization started at 3% and
> climbed to 10% at the end (was 30%).  couchjs remained at ~25% cpu
> consumption and CouchDB was typically at 100% of one core (was around 110%).
>
> The obvious conclusion is the append only file format for views is a really
> bad thing.  As a format it is very good for data integrity but not efficient
> for performance or size.
>
> I do care about integrity of my documents, but the thing I care most about
> for my views is performance.  (The major reason for a view is that it is
> more performant than visiting every document.)  Losing data from a view is
> no big deal - it can be regenerated (assuming generation doesn't remain as
> slow as it is today :-).
>
> Consequently I'd suggest using a different file format for views that is
> space and performance oriented, and the only data integrity feature being
> the ability to tell if some or all of it is inconsistent (eg if there was
> abrupt shutdown).
>
> Roger
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.9 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
>
> iEYEARECAAYFAktMJkYACgkQmOOfHg372QTokwCfZgm1LZnNGKO2nCNGa8C6uwoR
> OpMAmwRTlTOrPm5Cjrkj8fyWR4xc4k7w
> =F59/
> -----END PGP SIGNATURE-----
>
>

Adding the condition that corruption must be detectable as well, and
you're pretty much spot on. Implementing a more appropriate storage
format for indexes has been on my theoretical todo list for some time.
I poked at a couple things for fun but the other storage solutions
just weren't nearly as solid as the btree code so I never finished a
complete implementation.

HTH,
Paul Davis

Mime
View raw message