couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paul Joseph Davis (JIRA)" <j...@apache.org>
Subject [jira] Commented: (COUCHDB-623) File format for views is space and time inefficient - use a better one
Date Wed, 13 Jan 2010 19:28:54 GMT

    [ https://issues.apache.org/jira/browse/COUCHDB-623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12799891#action_12799891
] 

Paul Joseph Davis commented on COUCHDB-623:
-------------------------------------------

The consistency guarantee refers to the file format used guarantees on disk consistency the
same as is done for the main database file (ie, tail append MVCC style). Its not a reference
to figuring out the sync between the main db and the view. As you point out doing things like
querying with stale=ok can give you a view result that does not reflect the most recent changes
to the database or reflects changes from other clients etc etc.

> File format for views is space and time inefficient - use a better one
> ----------------------------------------------------------------------
>
>                 Key: COUCHDB-623
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-623
>             Project: CouchDB
>          Issue Type: Improvement
>          Components: Database Core
>    Affects Versions: 0.10
>            Reporter: Roger Binns
>
> This was discussed on the dev mailing list over the last few days and noted here so it
isn't forgotten.
> The main database file format is optimised for data integrity - not losing or mangling
documents - and rightly so.
> That same append-only format is also used for views where it is a poor fit.  The more
random the ordering of data supplied, the larger the btree.  The larger the keys (in bytes)
the larger the btree.  As an example my 2GB of raw JSON data turns into a 3.9GB CouchDB database
but a 27GB view file (before compacting to 900MB).  Since views are not replicated, this requires
a disproportionate amount of disk space on each receiving server (not to mention I/O load).
 The format also affects view generation performance.  By loading my documents into CouchDB
in an order by the most emitted value in views I was able to reduce load time from 75 minutes
to 40 minutes with the view file size being 15GB instead of 27GB, but still very distant from
the 900MB post compaction.
> Views are a performance enhancement.  They save you from having to visit every document
when doing some queries.  The data within in a view is generated and hence the only consequence
of losing view data is a performance one and the view can be regenerated anyway.  Consequently
the file format should be one that is optimised for performance and size.  The only integrity
feature needed is the ability to tell that the view is potentially corrupt (eg the power failed
while it was being generated/updated).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message