lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dennis Gearon <gear...@sbcglobal.net>
Subject RE: Ensuring stable timestamp ordering
Date Tue, 02 Nov 2010 04:41:43 GMT
how about a timrstamp with either a GUID appended on  the end of it?


Dennis Gearon

Signature Warning
----------------
It is always a good idea to learn from your own mistakes. It is usually a better idea to learn
from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'

EARTH has a Right To Life,
  otherwise we all die.


--- On Sun, 10/31/10, Toke Eskildsen <te@statsbiblioteket.dk> wrote:

> From: Toke Eskildsen <te@statsbiblioteket.dk>
> Subject: RE: Ensuring stable timestamp ordering
> To: "solr-user@lucene.apache.org" <solr-user@lucene.apache.org>
> Date: Sunday, October 31, 2010, 12:18 PM
> Dennis Gearon [gearond@sbcglobal.net]
> wrote:
> > Even microseconds may not be enough on some really
> good, fast machine.
> 
> True, especially since the timer might not provide
> microsecond granularity although the returned value is in
> microseconds. However, an unique timestamp generator should
> keep track of the previous timestamp to guard against
> duplicates. Uniqueness can thus be guaranteed by waiting a
> bit or cheating on the decimals. With microseconds can
> produce 1 million timestamps / second. While I agree that
> duplicates within microseconds can occur on a fast machine,
> guaranteeing uniqueness by waiting should only be a
> performance problem when the number of duplicates is high.
> That's still a few years off, I think.
> 
> As Michael pointed out, using normal timestamps as unique
> IDs might not be such a great idea as it effectively locks
> index-building to a single JVM. By going the ugly route and
> expressing the time in nanos with only microsecond
> granularity and use the last 3 decimals for a builder ID
> this could be fixed. Not very clean though, as the contract
> is not expressed in the data themselves but must
> nevertheless be obeyed by all builders to avoid collisions.
> It also raises the question of who should assign the builder
> IDs. Not trivial in an anarchistic setup where new builders
> can be added by different controllers.
> 
> Pragmatists might use the PID % 1000 or similar for the
> builder ID as it does not require coordination, but this is
> where the Birthday Paradox hits us again: The chance of two
> processes on different machines having the same PID is 10%
> if just 15 machines are used (1% for 5 machines, 50% for 37
> machines). I don't like those odds and that's assuming that
> the PIDs will be randomly distributed, which they won't. It
> could be lowered by reserving more decimals for the salt,
> but then we would decrease the maximum amount of timestamps
> / second, still without guaranteed uniqueness. Guys a lot
> smarter than me has spend time on the unique ID problem and
> it's clearly not easy: Java's UUID takes up 128 bits.
> 
> - Toke

Mime
View raw message