couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Brian Candler (JIRA)" <>
Subject [jira] Commented: (COUCHDB-465) Produce sequential, but unique, document id's
Date Thu, 13 Aug 2009 21:15:14 GMT


Brian Candler commented on COUCHDB-465:

I'd like to suggest an alternative algorithm for consideration.

- first 48 bits of the UUID is the time, in milliseconds, since 1 Jan 1970

- remaining 80 bits starts as a random value and increments from there,
  for example when doing a _bulk_docs insert (*)

I have been using this algorithm for a while, generated client-side - it's
in my 'couchtiny' ruby client.

I did it this way so as to get monotonically-increasing doc ids; a view with
equal keys will sort them in order of insertion into the DB. It also avoids
having to keep a separate "created_at" timestamp field, because you can just
get it from the id.

  def created_at[0,12].to_i(16) / 1000.0) rescue nil

Of course, the fact I generate uids like this demonstrates that there's no
one-size-fits-all solution, but I just thought it was worth mentioning
because you should get the B-tree insertion boost as a side-effect too.



(*) It's your choice whether you want to re-randomize this when the next
millisecond comes along, or just leave it to increment as a serial number.
Even if you have multiple servers inserting documents into the same
database, the chances of them using the same serial number within the same
millisecond are infinitessimal, as long as they all start from an
independent random point within the 2^80 possibilities.

Wrapping would be very rare, but what I currently do is re-randomize for
each bulk insert, and choose a starting random value which is more than 2^32
away from the ceiling.

> Produce sequential, but unique, document id's
> ---------------------------------------------
>                 Key: COUCHDB-465
>                 URL:
>             Project: CouchDB
>          Issue Type: Improvement
>            Reporter: Robert Newson
>         Attachments: sequence_id.patch
> Currently, if the client does not specify an id (POST'ing a single document or using
_bulk_docs) a random 16 byte value is created. This kind of key is particularly brutal on
b+tree updates and the append-only nature of couchdb files.
> Attached is a patch to change this to a two-part identifier. The first part is a random
12 byte value and the remainder is a counter. The random prefix is rerandomized when the counter
reaches its maximum. The rollover in the patch is at 16 million but can obviously be changed.
The upshot is that the b+tree is updated in a better fashion, which should lead to performance

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message