lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Killeen, Tom" <>
Subject parallel index building & searching multiple indexes
Date Mon, 11 Aug 2003 14:55:12 GMT
I am attempting to create approx 10 different Lucene indexes.  I'm trying to
create them at the same time by running multiple processes and each index is
written to a new directory.  Once I create more than one process - the
performance is very, very slow.  

Any sample code out there showing an efficient way to create multiple

Also, Any sample code out there to search the multiple indexes?


-----Original Message-----
From: Tatu Saloranta []
Sent: Monday, August 11, 2003 8:41 AM
To: Lucene Users List
Subject: Re: 2,147,483,647 max documents?

On Monday 11 August 2003 01:07, Kevin A. Burton wrote:
> Why was an int chosen to represent document handles?  Is there a reason
> for this?  Why wasn't a long chosen to represent document handles?  64
> bits seems like the obvious choice here except for a potentially bloated
> datastore.... (32 extra bits)

I can't speak for actual reasons (not being core Lucene developer), but the
general benefits of 32-bit ints vs. longs are:

- Better performance on pretty much any current architecture (even so-called
  64-bit CPUs often prefer 32-bit data access, and 64-bit representations
  more important for addressing).
  Also, smaller data set size is usually also good for performance
- Atomicity of access (read access can often be done without synchronizing);
  longs can not be atomically accessed in Java.

Another question is whether limited address space presents a real problem. 
Since Lucene can reuse doc ids (or rather, there is not persistent id per
doc id is just an index, and holes left by removed docs can be reused?), 
perhaps this is usually not much of an issue?

-+ Tatu +-

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message