lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tatu Saloranta <>
Subject Re: 2,147,483,647 max documents?
Date Mon, 11 Aug 2003 13:41:07 GMT
On Monday 11 August 2003 01:07, Kevin A. Burton wrote:
> Why was an int chosen to represent document handles?  Is there a reason
> for this?  Why wasn't a long chosen to represent document handles?  64
> bits seems like the obvious choice here except for a potentially bloated
> datastore.... (32 extra bits)

I can't speak for actual reasons (not being core Lucene developer), but the
general benefits of 32-bit ints vs. longs are:

- Better performance on pretty much any current architecture (even so-called
  64-bit CPUs often prefer 32-bit data access, and 64-bit representations are
  more important for addressing).
  Also, smaller data set size is usually also good for performance (caching).
- Atomicity of access (read access can often be done without synchronizing);
  longs can not be atomically accessed in Java.

Another question is whether limited address space presents a real problem. 
Since Lucene can reuse doc ids (or rather, there is not persistent id per se? 
doc id is just an index, and holes left by removed docs can be reused?), 
perhaps this is usually not much of an issue?

-+ Tatu +-

View raw message