couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Newson (JIRA)" <j...@apache.org>
Subject [jira] Commented: (COUCHDB-465) Produce sequential, but unique, document id's
Date Thu, 13 Aug 2009 18:11:14 GMT

    [ https://issues.apache.org/jira/browse/COUCHDB-465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12742911#action_12742911
] 

Robert Newson commented on COUCHDB-465:
---------------------------------------

Thanks!

Guessability is a concern, which means this might need to be switchable. Perhaps couch_seq_generator
becomes couch_id_generator and an ini file chooses between the two strategies, defaulting
to the safest, but worst-case, new_uuid behavior. To get good keys for b+tree insertion necessarily
makes them more guessable as they'd have to be close to existing keys by design.

I do owe some quantitative benchmarking to support the assertions in the description. I did
a 10k insertion test with a small document, {content: "hello"}, and average insertion rate
per document was 2ms with random and 1ms with the patch. This was more to prove that I'd changed
*something* rather than a measure of the actual improvement. I would expect to see improved
insertion rates across a lot of scenarios, less difference between uncompacted and compacted
size (barring document updates and deletes) as less of the b+tree is rewritten, and a smaller
post-compaction size vs random. The exact extent of these improvements should be established
by a decent benchmark. 



> Produce sequential, but unique, document id's
> ---------------------------------------------
>
>                 Key: COUCHDB-465
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-465
>             Project: CouchDB
>          Issue Type: Improvement
>            Reporter: Robert Newson
>         Attachments: sequence_id.patch
>
>
> Currently, if the client does not specify an id (POST'ing a single document or using
_bulk_docs) a random 16 byte value is created. This kind of key is particularly brutal on
b+tree updates and the append-only nature of couchdb files.
> Attached is a patch to change this to a two-part identifier. The first part is a random
12 byte value and the remainder is a counter. The random prefix is rerandomized when the counter
reaches its maximum. The rollover in the patch is at 16 million but can obviously be changed.
The upshot is that the b+tree is updated in a better fashion, which should lead to performance
benefits.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message